Anyone who has lived through the COVID-19 pandemic won’t be surprised at the results of new research from UNSW Business School – that people jump to conclusions when they read about studies with relatively small sample sizes.
This doesn’t just extend to the general public either. The research (which had a sample size of nearly 4000 participants) was found to apply to a wide variety of participants, including tertiary-level statistics students and senior business leaders.
These findings from by UNSW Business School's Dr Siran Zhan, Senior Lecturer in the School of Management and Governance, show just how easily people jump to conclusions when reading about studies, making it critical that journalists – and the general public – communicate and digest this information with a critical eye.
In the study, Relative Insensitivity to Sample Sizes in Judgments of Frequency Distributions, Dr Zhan and her co-author, Dr Krishna Savani, Professor of Management at the Department of Management and Marketing at The Hong Kong Polytechnic University, show people ignore sample sizes in their judgments and decisions and tend to be unduly confident in conclusions drawn from studies with as few as three participants.
"What surprised us was that when we examined samples of university-level statistics students and seasoned senior executives who are supposedly trained in their education or professional work to make judgments and decisions according to sound statistical principles, they ignored the sample size just as much as the public," says Dr Zhan.
"It is especially appalling to think many important businesses and public policy decisions might have been made based on unreliable results from small samples,” she says.
Dr Zhan says the research shows that people might not have the correct intuition as to what counts as evidence, making it difficult to correctly use statistics and research evidence to guide their inferences and decisions.
The good news? The researchers also tested a way to prevent the spread of misinformation.
Read more: How can we get better at telling misinformation from reliable expert consensus?
What is a sample size, and why is it important?
Early in the COVID-19 pandemic, pharmaceutical and biotechnology company Moderna (MRNA) reported that its experimental vaccine was successful in eight volunteers. While only a small group of healthy volunteers were tested, journalists were quick to report the news, which was so well received that it drove up Moderna's share price by 20 per cent.
Just hours after announcing the trial's success, Moderna sold 17.6 million shares to the public, raising US$1.3 billion. While Moderna, and several of its top executives, profited off the back of the boom, some critics say it overstated the significance of the vaccine trial and manipulated the market.
Examples like these demonstrate that most people don't overthink the significance of a study's size when making assumptions from articles they read in print and online.
"In other words, people's general tendency to be unduly confident in conclusions drawn from tiny samples is incommensurate with statistical principles and can lead to poor judgment and decisions," explains Dr Zhan.
So, in six experiments involving a total sample of 3914 respondents, she tests whether people pay attention to variations in the sample sizes, which vary by one or two orders of magnitude.
The findings reveal people pay minimal attention to variations in the sample size by a factor of 50, 100, and 400 when making judgments and decisions based on a single sample.
"Even with a sample size of three, participants' mean confidence level was 6.6 out of 10, indicating that people have pretty high confidence in data from incredibly small samples, consistent with prior research," explains Dr Zhan.
"As researchers, we realise that the same finding is much more believable from a sample of 3000 than from a sample of 30. However, shockingly, the general population does not appear to share this intuition," she says.
What's an appropriate sample size?
With the increasing spread of online disinformation and misinformation, making judgements about what we're presented with in the media is becoming increasingly important.
"With the proliferation of statistics in the news media and in organisations that call for evidence-based decision-making, the current findings indicate that people might not have the correct intuition as to what counts as evidence, making it difficult for them to correctly use statistics and research evidence to guide their inferences and decisions," explains Dr Zhan.
But is there such a thing as the right sample size? Bigger is generally better, statistically.
"The mean result from any sample is pulled or biased by outliers. But when your sample size increases, your sample gets closer to the population, meaning fewer estimation errors," explains Dr Zhan.
"When the sample size is small (e.g., 30), any outlier has a much stronger effect on the mean, making your mean less reliable than when the sample size is large (e.g., 3000).”
The only issue is the cost of time and money to collect data from a very big sample.
"Put another way, when you estimate an effect from a sample (e.g., 500 customers), you are always trying to generalise your result to a population (e.g., your 13,974 existing customers), which in reality, is too large for you to thoroughly study."
"Therefore, a trade-off must be made based on sound statistically ground so that we work with a statistically reliable yet realistically feasible sample size," she says.
Read more: Authors and AI unite: the arrival of AI-augmented writing
Study design to help prevent the spread of misinformation
Judgements and biases regarding research design and methodology don't just affect what we read in the media; these judgements permeate almost every aspect of our lives, from public policies to workplaces.
"Organisations evaluate employee performance based on a limited time window or a small number of projects (e.g., monthly sales record or past three projects). In these cases, entrepreneurs and managers need to understand that their findings, however substantive, may not be reliable if they were drawn from small samples," explains Dr Zhan.
Therefore, Dr Zhan's research holds important implications for media, journalists, policymakers, and businesses who often use results from samples (sometimes tiny samples) to inform the public and make critical decisions.
To improve decision quality, Dr Zhan suggests that all statistics must be accompanied by statistical inferences and 'layperson interpretations' of the statistical inferences.
"We recommend more statistical advice (i.e., a layperson interpretation of the strength of evidence statistics) to be provided to aid their interpretation of findings from samples and, ultimately, decision-making," she says.
What does this look like in practice? "For example, the Environmental Working Group provides a searchable online database with information on skincare product safety (example here) on two primary scores: The strength of an effect (i.e., the hazard score) and the strength of evidence (i.e., data availability).
"The data availability information is equivalent to the strength of evidence information that we are advocating here," explains Dr Zhan.
Read more: Facebook takes its ethics into the metaverse - and critics are worried
But what about consumers?
Consumers do not always read research articles, so research generally reaches consumers through product information, news, and books. "Therefore, we recommend that the strength of evidence statistics be presented alongside data availability information,” explains Dr Zhan.
"Consumers should be educated to question any claims unless there is strong evidence (i.e., a large amount of independent research involving large sample sizes). But educating consumers is difficult; more importantly, we think the burden must be placed on businesses, journalists, and the media," she says.