It's spring 2014 and that means it is only a matter of months before people the world over are buried under an avalanche of public polls, purporting to show that some thing, leader, or law is about to be heaved upon them by majority vote. From Slovakia, to the United States, to Indonesia, to Bangladesh, to Sweden, few populations will escape the year unscathed.

Typically, such polls are posed to a small subset (\(N \ll N_{pop}\)) of the population as a binary choice: "Do you support Party X or Party Y?", "Person 1 or Person 2?", "Are you for or against Issue A?". The results are used to infer the preference of the population at large, to within some margin of error, i.e. "Thing X is preferred by 56.3% of the population \(\pm\) 4.2%".

This raises some questions:

- How is the population sampled?
- What is the probability model?
- Where does the \(\pm\) come from and how is it calculated?
- How does the \(\pm\) depend on the number of people polled?

These issues are not trivial and can be difficult to get right. The Chicago Tribune famously blew the call on the United States Presidential election of 1948, calling the election for Thomas E. Dewey when in fact Harry Truman had won.

To the first question, a polling agency will attempt to sample the population at random by targeting a mixture of people that reflects as nearly as possible the known distribution of income, education, race, religion, etc. in the total population according to a census or some other survey.

Implicit in multiple choice polling questions is the assumption that, despite rich differences between people, each person can be approximated by their choice from a limited set of predetermined options. For simplicity, let's say the question is \(\mathbb{Q}\) and that people can respond in one of two ways, as above. If they're for one choice we count them in group \(\mathbb{A}\) which has \(A\) people in the full population. If they're for the other choice, they're in group \(\mathbb{B}\) which has \(B\) people.

By asking a total of \(N\) people at random, we're effectively doing the same thing as when we pick random samples (with replacement) from a bag of colored marbles. Each randomly selected person has probability \(\displaystyle p_A = \frac{A}{A+B}\) of having opinion A, and probability \(\displaystyle p_B =\frac{B}{A+B} = 1-p_A\) of having opinion B. Therefore, the probability model is the binomial distribution.

The third and fourth questions are our objects of focus, which we'll discuss next.

## Comments

Sort by:

TopNewestHey Josh excuse me to comment here but I didn't find any different way to thank you because of the amazing notes you are doing in mechanics new section. I've started learnings Phisycs from them, because they're awesome – Jordi Bosch · 2 years, 7 months ago

Log in to reply

– Josh Silverman Staff · 2 years, 7 months ago

Thank you Jordi. That is very encouraging news, and it makes me want to write even more.Log in to reply

– Jordi Bosch · 2 years, 7 months ago

You can delete this comment once you haver read it. It doesn't fit with the articleLog in to reply