Research studies, opinion polls and surveys all rely on asking a number of people about something to try to extract some pattern of behaviour or predict a result.
But how many people do you need to ask for that finding to have any convincing meaning?
Before any election you’ll always hear some politician casting doubt on opinion polls, saying: “There’s only one poll that matters.” They try to make us believe that those headline-grabbing polls count for nothing compared with the real election poll of those registered to vote.
But opinion polls are useful because they can give a rapid insight into people’s intentions.
Taking small samples from large populations is a valid statistical technique for getting accurate information about the wider population, for a fraction of the time and cost.
This applies wherever we have large or hard to measure populations.
So how big does a sample need to be for its results to be reliable? Well, that depends.
Margin of error
All sample estimates have a margin of error, which compensates for the imperfection of the sample compared with the population. For example, a recent Newspoll put Labor 2% ahead of the Coalition on a two-party preferred basis.
Newspoll says it surveyed 1,728 people, with a maximum sampling error of ± 2.4%. This means the largest plausible win for Labor would be 4.4% (2% plus 2.4% margin of error), but it’s also plausible it could lose by 0.4% (2% minus 2.4%).
For this tight race we might want to reduce our margin of error by increasing our sample size. But that will be costly as the gains in accuracy diminish for greater numbers. A sample of roughly 2,400 people would be needed to reduce the margin of error to ± 2%, and a massive sample of 9,600 to reduce it to ± 1%.
Quality matters as well as quantity
Survey estimates and their margins of error are only valid if the sampling has been well conducted. If the sampling is biased then larger sample sizes likely just give us high confidence around an inaccurate estimate.
Survey samples are often biased because they differ from the population in important ways. With 12.6 million respondents, the 2017 same-sex marriage survey is a good example as this clearly overrepresented older people who were more likely to return their postal survey.
Fortunately, in this case the bias does not undermine the result, which was a resounding vote for marriage equality. But the estimate of 61.6% in favour of marriage equality, with a tiny margin of error of 0.03%, may not accurately reflect the opinion of the Australian population.
Unrepresentative samples also happen in clinical trials because high-risk patients are often excluded from trials for safety reasons.
One study found that 94% of people with asthma would have been excluded from the 17 major clinical trials used to write guidelines for doctors about treating the condition.
This is a serious problem, because doctors need to give advice to all of their patients, but the best evidence comes from trials that used generally healthier patients.
Similarly, imagine trying to predict how subscribers to Netflix or Stan will rate movies based on ratings from other similar subscribers. These ratings are likely to be biased, as only people who particularly like or dislike a given movie may bother to rate it.
This is an important problem to solve for online content distributors in order to provide accurate movie recommendations to customers.
How does the public judge a good sample?
There are no simple rules for judging a good sample size. Bigger is generally better, but only when the survey has been well conducted.
Some very large samples may have used cheap data collection tools, such as Facebook, and so may be highly skewed. Small surveys of just 25 people can be insightful, especially where efforts have been made to ensure a representative sample and chase people who don’t initially respond.
The Australian Press Council has guidelines on reporting opinion polls, and here are some questions you can ask yourself when reading about any survey:
Where were the participants found? How typical are they of the whole population of interest?
How many participants declined to respond? If only 10% of people responded then it is likely an atypical sample who have strong feelings about the survey’s subject. (Think about what surveys you’d likely respond to.)
Were the survey respondents paid? Payment will increase the response rate, but might also affect respondents’ answers.
Sadly these details are often lacking from the media releases and news reports of exciting findings from surveys, and are also often lacking from published papers.
Survey respondents can also be steered towards desirable answers. For example, a Nature survey of 1,576 researchers on the reproducibility crisis asked the question:
Which of the following statements regarding a ‘crisis of reproducibility’ within the science community do you agree with?
(i) There is a significant crisis of reproducibility
(ii) There is a slight crisis of reproducibility
(iii) There is no crisis of reproducibility
(iv) Don’t know.
A majority (52%) of people said “Yes” to a significant crisis, 7% answered “Don’t know” and just 3% “No”.
This leaves the question of what is meant by a “slight crisis”, a verdict reached by 38% of people. Did they answer slight because they are close to the “no” or “don’t know” categories, or are they close to considering it a significant crisis? We can’t tell.
Overall it’s best to read the results of any survey with healthy scepticism. Our survey of the two statisticians who wrote this article showed a 100% agreement with this statement.
Adrian Barnett, Professor of Public Health, Queensland University of Technology and Scott Sisson, Professor of Statistics at UNSW, President of the Statistical Society of Australia and a Deputy Director of the Australian Centre of Excellence in Mathematical and Statistical Frontiers, UNSW