It's spring 2014 and that means it is only a matter of months before people the world over are buried under an avalanche of public polls, purporting to show that some thing, leader, or law is about to be heaved upon them by majority vote. From Slovakia, to the United States, to Indonesia, to Bangladesh, to Sweden, few populations will escape the year unscathed.
Typically, such polls are posed to a small subset () of the population as a binary choice: "Do you support Party X or Party Y?", "Person 1 or Person 2?", "Are you for or against Issue A?". The results are used to infer the preference of the population at large, to within some margin of error, i.e. "Thing X is preferred by 56.3% of the population 4.2%".
This raises some questions:
These issues are not trivial and can be difficult to get right. The Chicago Tribune famously blew the call on the United States Presidential election of 1948, calling the election for Thomas E. Dewey when in fact Harry Truman had won.
To the first question, a polling agency will attempt to sample the population at random by targeting a mixture of people that reflects as nearly as possible the known distribution of income, education, race, religion, etc. in the total population according to a census or some other survey.
Implicit in multiple choice polling questions is the assumption that, despite rich differences between people, each person can be approximated by their choice from a limited set of predetermined options. For simplicity, let's say the question is and that people can respond in one of two ways, as above. If they're for one choice we count them in group which has people in the full population. If they're for the other choice, they're in group which has people.
By asking a total of people at random, we're effectively doing the same thing as when we pick random samples (with replacement) from a bag of colored marbles. Each randomly selected person has probability of having opinion A, and probability of having opinion B. Therefore, the probability model is the binomial distribution.
The third and fourth questions are our objects of focus, which we'll discuss next.