September 15, 2016

Let’s make research great again!


Last Wednesday, a lot of people all around the world woke up to a big surprise. In Europe at least, it seemed like nobody had really expected Donald Trump to win the presidential elections. It didn’t take the media, and social media in particular, long to identify who was to blame for creating these false expectations in the public consciousness: The unreliable methodology of opinion research.

Just to be clear: We don’t want to take sides in the political debate, nor do we have a vote. We have not predicted the outcome of the American election or contributed to any of the electoral forecasts. In fact, Norstat does not even have a presence in the United States. However, after following the polling prior to the latest American elections as well as media’s coverage of it, we want to share some of our thoughts on how we are convinced research should be done. We will question how wrong it really went, and present our views on how we, as providers of polls and other research services, should communicate with the public – alongside with and through the media.

We are strongly convinced that research is indeed able to predict the outcome of elections and other social phenomena, provided we rigorously live up to the standards of our industry.

Margin of error and other things from the schoolbook

As every researcher knows the margin of error is maximum for distributions of 50%. This is why it is so hard to statistically predict the outcome of head-to-head races like the US elections and the the Brexit-referendum in the UK. Historically, US elections are based on the polarization of candidates from just the two big political parties. This alone gives us a reason why differences behind the decimal point should not be interpreted in such forecasts.

However, for the US elections, it is even more complicated than that. In America, the winning candidate of most states takes it all and sends a specific amount of electors to the electoral college. This is why even small margins of error in the state polls can result in a quite unpredictable result as a whole. And it can lead to the situation, where one candidate has more popular votes in total but still loses the election. For predictions that can be relied on, research designs need to be smarter than merely a national poll.

Especially when the subject of your research requires a very elaborate solution, there is no excuse for doing inferior research. All the well-established principles of methodology that you find in the textbooks still apply: You still need your sample to be representative. You still need to conduct a fair amount of interviews. Response rates still matter. Although our industry needs to develop and change, lowering the standards of quality is not the right direction.

FiveThirtyEight lists different pollsters by the accuracy and method of their prediction. It is not surprising that there are large discrepancies, going from excellent quality research to completely faked or made-up results. Unfortunately, for outsiders of our industry, it is hard to distinguish if a research company is working with methodological rigor or just going in for a big headline. All we can do is encourage everyone to be completely transparent about data sources and the applied research methods or tools. Obviously, this goes for both the buyers and providers of research.

Unrepresentative sample

Donald Trump’s victory followed shortly after another “impossible” vote: The Brexit. Much like the presidential election, our industry’s prediction of the Brexit-referendum was also regarded as a failure. And in fact, only a few agencies predicted “leave” as the winner. To make matters worse, the Brexit prediction followed the perhaps most profound mistake in forecasting of them all: the one for the British parliament elections in 2015. As a consequence, this election forecast has been evaluated thoroughly afterwards. The inquiry found quite a few things that could have had an effect on the polls: postal voting, question wording and framing, late swing of the voters, deliberate misreporting and social desirability. We will not go through, or repeat, the report’s findings on all these elements, but their main finding was well worth reflecting upon:

“Our conclusion is that the primary cause of the polling miss in 2015 was unrepresentative samples”.

As for the polling during the presidential campaign in America, it is too early to conclude exactly what went wrong. That being said, there is reason to believe that some of the findings for the research in Britain are relevant in the US case, as well. An interesting truth is that we – as insiders in this industry – are not particularly surprised by the main finding in the UK research. There are sectors in our industry that seem to take representativeness way too lightly. There is an ongoing debate on questions like how to recruit panels correctly, which sources are most reliable or how to maintain a top quality panel with high response rates. At Norstat, we have our clear views on the best way to recruit high quality panels, but that is beyond the scope of today.

What we would really like to discuss is the mostly ignored fact that, nowadays, a lot of sample is traded on an international sample exchange and that you can buy cheap sample without knowing the origin of the sample.  Our main point here isn’t that people shouldn’t be allowed to trade sample, buy top-ups (as the industry lingo for buying the remaining sample in a quota is called) or negotiate prices. What we really like to stress is the fact that it is almost impossible to control and adjust for representativeness (e.g. by weighting), when sample is bought and re-bought by multiple data providers and there is little or no transparency about the sources of origin.

You might say that “this study isn’t that important” and “we only need to know the direction on this study”. Sometimes that might be fair enough. If accuracy is not paramount, it might make sense not to be willing to pay a lot for it. Sometimes budgets are low, sometimes direction is all you need. However, we strongly encourage you to opt to do fewer interviews with a representative sample, rather than buying from a biased or non-transparent source. If you go for fewer interviews, you will get a higher margin of error. Going for a biased or non-transparent source, it can be completely wrong.

It seems almost weird to have to be writing these very simple and basic truths about statistics and methodology. Unfortunately, it feels necessary to speak up against those that claim that bias + bias = representativeness, and those that seem to believe that if you use witchcraft and blur your data sources you will somehow get the correct answers.

The role of journalism

The other strong concern is the role of journalism. Prior to the election, an enormous amount of polls where presented in the media. It is striking, though, how seldom media made the reader, listener or viewer understand what exactly was polled. You could read headlines like “The latest polls favor Hillary” or “Pollsters say, Hillary has a 89% chance of winning”, but more often than not the copy lacked basic methodological background information on how interviews were conducted, what the basic population of this study was or how many people were asked. By default, methodological background information ought to be released along with every statistic reported in the media, but a critical reader will find many statistics that lack this kind of information.

Why is this the case? Very often, journalists are not able to explain to their audiences what the pollsters have actually done. In some cases, they may not understand themselves, in other cases they may just not have asked, but frequently it will be because we as an industry have not informed them. This is an explicit reproach to ourselves as an industry. If others do not understand our standards, we need to explain what we are doing and for what reasons. We are the experts in our field. Why should you not average out a candidate’s prediction throughout different studies? What is the true meaning of representative? How does blended sampling affect the overall quality? What is the right weighting model? What is the difference between online and telephone interviewing? These are just some of the issues that are present, but that we fail to bring to the light of day.

Until proven wrong, at Norstat we believe that there were substantial problems with some of the electoral polls. We do, however, also believe that these problems have been aggravated by journalists with low competency in statistics, budget restraints and a tendency to choose fast and cheap data over quality and accuracy. A quick and cheap poll and a sensational headline might drive click rates, but does nothing to ensure that the statistics are correct.

Therefore, we would like to call for more solid research and a more critical communication of insights. As long as cheap samples and fast predictions can make a big headline, our industry will suffer from being discredited. This is what Norstat will do: We promise to be transparent. We promise to work hard for quality in our data. We promise not to compromise ourselves away from solid research. We promise to help media and journalist to get their story right by actively giving them adequate and correct information.

To finally end what already is too long we would like to borrow from Nate Silver again, and quote a statement that should make both pollsters, analysts and media reflect:

We strongly disagree with the idea that there was a massive polling error. Instead, there was a modest polling error, well in line with historical polling errors, but even a modest error was enough to provide for plenty of paths to victory for Trump. We think people should have been better prepared for it. There was widespread complacency about Clinton’s chances in a way that wasn’t justified by a careful analysis of the data and the uncertainties surrounding it.