What’s Really Wrong with Polling

November 10, 2016

What Can Researchers Learn From Yet Another Major Polling Fail (Text Analytics Polling^TM )
Whatever your politics, I think you’ll agree that Tuesday’s election results were stunning. What is now being called an historic upset victory for Donald Trump apparently came as a complete shock to both of the campaigns, the media and, not least, the polling community.

The question everyone seems to be asking now is how could so many projections have been so far off the mark?

Some pretty savvy folks at Pew Research Center took a stab at some reasonable guesses on Wednesday—non-response bias, social desirability bias, etc.—all of which probably played a part, but I suspect there’s more to the story.

I believe the real problem lies with quantitative polling, itself. It just is not a good predictor of actual behavior.

Research Told Us Monday that Clinton Was In Trouble

On Monday I ran a blog post highlighting responses to what was inherently a question about the candidates’ respective positioning:

“Without looking, off the top of your mind, what issues does [insert candidate name] stand for?”

Interestingly, in either case, rather than naming a political issue or policy supported by the candidate, respondents frequently offered up a critical comment about his/her character instead (reflecting a deep-seated, negative emotional disposition toward that candidate). [See chart below]

Our analysis strongly suggested that Hillary Clinton was in more trouble than any of the other polling data to that point indicated.

Why?

The #1 most popular response for Hillary Clinton involved the perception of dishonesty/corruption.

The #1 and #2 most popular responses for Donald Trump related to platform (immigration, followed by pro-USA/America First), followed thirdly by perceived racism/hatemongering.

Bear in mind, again, that these were unaided, top-of-mind responses to an open-ended question.

So for those keeping score, the most popular response for Clinton was an emotionally-charged character dig; the two most popular responses for Trump were related to political platform.

This suggested that not only was Trump’s campaign messaging “Make America Great Again” resonating better, but that of the two candidates, the negative emotional disposition toward Hillary Clinton was higher than for Trump.

Did We Make a Mistake?

What I did not mention in that blog post was that initially my colleagues and I suspected we might have made a mistake.

Essentially, what these responses were telling us didn’t jibe with any of the projections available from pollsters, with the possible exception of the highly-respected Nate Silver, who was actually criticized for being too generous with Trump in weighting poll numbers up (about a 36% chance of winning or slightly better than expecting to flip tails twice with a coin).

How could this be? Had we asked the wrong question? Was it the sample*?

Nope. The data were right. I just couldn’t believe everyone else could be so wrong.

So out of fear that I might look incompetent and/or just plain nuts, I decided to downplay what this data clearly showed.

I simply wrote, “This may prove problematic for the Clinton camp.”

The Real Problem with Polls

Well, I can’t say I told you so, because what I wrote was a colossal understatement; however, this experience has reinforced my conviction that conventional quantitative Likert-scale survey questions—the sort used in every poll—are generally not terrific predictors of actual behavior.

If I ask you a series of questions with a set of answers or a ratings scale I’m not likely to get a response that tells me anything useful.

We know that consumers (and, yes, voters) are generally not rational decision-makers; people rely on emotions and heuristics to make most of our decisions.

If I really want to understand what will drive actual behavior, the surest way to find out is by allowing you to tell me unaided, in your own words, off the top of your head.

“How important is price to you on a scale of 1-10?” is no more likely to predict actual behavior than “How important is honesty to you in a president on a scale of 1-10?”

It applies to cans of tuna and to presidents.

@TomHCAnderson

[*Note: N=3,000 responses were collected via Google Surveys 11/5-11/7 2016. Google Surveys allow researchers to reach a validated (U.S. General Population Representative) sample by intercepting people attempting to access high-quality online content—such as news, entertainment and reference sites—or who have downloaded the Google Opinion Rewards mobile app. These users answer up to 10 questions in exchange for access to the content or Google Play credit. Google provides additional respondent information across a variety of variables including source/publisher category, gender, age, geography, urban density, income, parental status, response time as well as google calculated weighting. All 3,000 comments where then analyzed using OdinText to understand frequency of topics, emotions and key topic differences. Out of 65 topics total topics identified using OdinText 19 topics were mentioned significantly more often for Clinton, and 21 topics were significantly more often mentioned for Trump. Results are +/- 2.51% accurate at the 95% confidence interval. ]

19 Responses

Laura says:

Nov 10, 2016 at 7:33 pm

I’d be curious to see related machine learning predictive analytics research to see if there are any correlations between the “issues” features and the actual results.
Jacky says:

Nov 11, 2016 at 2:06 pm

Its also why Qual is (nearly always) better than Quant. We are dealing with people here, not robots, and you need to get under the skin of the respondents to reveal what they really feel. We are all bigoted and rarely overtly admit what we really think when it comes to judgments about our fellow man ( or woman). I myself, despite a ‘rational’ take on what I thought should be the outcome of the election, rather suspected deep down, that any man would always ( in current climate) – win over a woman put up against him. ( Sadly…. )
Tom H. C. Anderson says:

Nov 11, 2016 at 4:20 pm

@Laura, agree, would be a logical and fun next step analysis
@Jacky, problem with qual is projectability/numbers (don’t think Luntz was any better at predicting this). Nice thing about text analytics is that it gives you best of both worlds 😉
Kyle says:

Nov 11, 2016 at 9:45 pm

The polls were not THAT wrong. They predicted a 4-pt Hillary win, but instead they tied (in popular vote). Basically, that means you take 2% from her, and give it to him. 2% is not a big error. Yes the polling and modeling failed to predict the winner… but not to the huge degree that everybody is talking about.
Terence says:

Nov 11, 2016 at 9:46 pm

Tom- Nice work here. Love the take on this. The scales may strip out the emotion in this case to an extent but perhaps it’s too big a leap to say it’s useless in predicting behavior in everything? Great food for thought. Thanks!
David Bakken says:

Nov 11, 2016 at 10:13 pm

I’m not sure what the chart is actually telling us. The title is “Clinton vs. Trump–Net Differences,” which suggests, to take one example, out of all the people who gave a “lying/cheating/corruption/crime/untrustworthy” response, the difference between the number who said that about Clinton was 12 percentage points greater than the number who mentioned one of these things for Trump. Is that the correct interpretation?
John Dick says:

Nov 11, 2016 at 10:13 pm

Tom, nobody probably wants to see traditional pollsters disappear more than we do. However, are we being fair to say they were that badly wrong in this case? The majority of polling averages had Clinton winning the popular vote by around 3%. She won by 1%, well within a reasonable margin for any individual poll. I’ve seen the LA Times touting their numbers as the only that predicted Trump winning – they had him winning the popular vote by 3%…meaning they were actually MORE wrong than the pollsters who had HRC +3. The polling seemed to fall further short in a handful of key states, namely PA, WI, NC, and a couple others – still, however, only by <5 points in most cases. Did you look at you method above on a state-by-state basis? Or perhaps among population sub-groups like white women or blue-collar democrats, which seem to be the groups that tilted the election in the key states? I think the entire field of opinion research would love to find a better solution, so we're never so surprised again. Hopefully you're on to something here.
Adrian Ho says:

Nov 11, 2016 at 10:26 pm

Thanks for posting this Tom. I suspect the failure of the polls was an over reliance on simplistic aided questions. Easy to collect and analyze, but lacking the richness of unaided, Qual type questions. Actually going out to talk to people and understanding the drivers of their vote choice, the strength of their convictions, etc, would have uncovered another story and at the very least led to tweaking of the original quant survey. Of course this takes more time, money, effort, and I am guessing the “experts” didn’t choose to be that thoughtful.
Skeptical Inquirer says:

Nov 11, 2016 at 10:32 pm

So wait, after a (almost) binary event occurred, you’ve pulled up some data analyzed retrospectively to “explain” the results and why the polls got it wrong? How would this thinking stand up to scientific rigor?
Didn’t FiveThirtyEight go 51 for 51 in calling electoral wins last election? Quantitative research doesn’t work?
Tom H. C. Anderson says:

Nov 11, 2016 at 10:37 pm

The chart tells us more about positioning than anything else. Perhaps the most important aspect of marketing, and yet seemingly also the most neglected. However, the underlying data and topic (text based features) can also be modeled and used to predict any target variable, in our experience far more accurately than yes no or likert scale data.
All the hub bub aside, pollsters have one very simple variable to calculate, win or no win. They failed.

I think it’s interesting that so many researchers, unlike the pollsters, are never held accountable for real outcomes. Their assumptions about the accuracy of their structured data are never tested on actual outcomes.

The Shell Oil case on our website is a great example. Most companies out there tracking customer satisfaction seem to be following NPS methodology. Yet it has never been proven to have a link to business growth as the authors of Bain consulting have repeatedly claimed in Harvard Business Review and elsewhere. In fact, we proved them terribly wrong with data from close to 1 million customers. Conversely we also proved that unstructured data (comments) analyzed via text analytics was far more accurate than their NPS metric or any other likert scale combination in the data.

That is what I’m referring to in the post. Time to rethink things…

Happy to show folks what I mean with their own data
William T says:

Nov 12, 2016 at 4:13 am

Heres my take on a variety of topics mentioned:1. I agree the polls werent that far off in pure comparison
Of who would you vote for.
2. What i didnt understand is how a 3-4% Hilary
Lead in many polls could pissibly translate into
A 75%+ likelihood to win….. i feel stupid now
Having bought into that.
3. likert scales have been very useful for me in years
….. just need to be careful tio not use them to access
Things that arent concrete. Who id viote for in this recent
Pres election question may have likely had a 4-5%
Error for reasons indicated in comments….social
Desirability…. or not being perceived as piriah + non response
Skewing toward Less reoresentation of Trump dupporters
ton says:

Nov 12, 2016 at 6:35 pm

“In hindsight, I’m always right”.To me it seems that if Mr Anderson really had had sufficient confidence in his own research method he would (should?) have drawn and communicated a much clearer and stronger conclusion, before the actual outcomes were there.
And not afterwards using it as a sale pitch.
Steve Genco says:

Nov 13, 2016 at 8:10 pm

Hi Tom, very interesting post and an excellent contribution to the post-polling-debacle debate that is will be continuing for months if not years. Some of your commenters are a little too kind on pollsters, I fear. The national polling is irrelevant to the outcome (as we have seen yet again) so whether the results were within the margin of error is rather beside the point. The damage was in the state polling, and it was a disaster. Time for a full-scale reconsideration.
I do believe nonresponse bias was an important element, as has been shown pretty persuasively by Gelman, Rothschild, and Abramowitz (e.g., here: http://www.slate.com/articles/news_and_politics/politics/2016/08/don_t_be_fooled_by_clinton_trump_polling_bounces.html). But the wider meaning of this is more profound. It is that the whole “horse race” aspect of campaign coverage is largely (if not totally) artifactual, which means the media has merely been distracting us with irrelevancies for the last 18 months. What we’re seeing is pretty much the death of the “informed citizenry” theory of democracy (see Bartels and Achen, Democracy for Realists, just out earlier this year).

This leaves us with big question: what does drive election results? I do think your simple but intriguing study provides some partial answers. First, for “the people,” the word “issues” means something different, and more expansive, than it does for policy nerds and political activists. Normal people do not distinguish between issue positions, character, and personality traits. That’s worth remembering.

Second, I’m surprised that some of your results weren’t even more extreme. Is the difference in mentions of “building a wall” between Trump and Hillary really only 4%? I would have guessed it would have been 90% on the Trump side. Or “Making America Great Again” … only 2% more mentioned for Trump than for Clinton? Any thoughts on why these differences are not even bigger?

Third, I think your emphasis on “top of mind” associations is critically important. It gives us a view into the associational networks that surround the concepts “hillary” and “trump” in voters minds, along with the relative strengths of those associations. We have 30 years of conditioning that has succeeded in building and strengthening the connection between ‘hillary’ and ‘corruption’, which made it possible for Trump to crystallize that connection with the simple phrase “Crooked Hillary.” Like it or not, the man is a master of branding. Sadly, the factual basis of that connection is irrelevant. That set of nodes around the “great right-wing conspiracy” is only weakly integrated into the network. What matters is “you say hillary, I say crooked”. The unconscious mind does not evaluate, it only associates. That’s the lesson that pollsters and indeed, most marketers and political operatives, have yet to learn.

Polling has become a staid and stale methodology. It has grown fat, rich, and happy, and change resistance among practitioners is high (there are exceptions, e.g., Doug Rivers at YouGov). This election may finally blow up that complacency. We have to find better ways to uncover the implicit, emotional, visceral, tribalistic “gut feelings” that underlie political orientations, choices, and behavior. I think your top of mind method, with its scalability and simplicity, could be a big part of a new paradigm. I also think approaches like implicit response testing and affective and semantic priming, maybe even some neuro techniques, need to be integrated into the methodological mix. It still matters what people say, but until we have better methods for understanding how they think, we’re not going to be able to predict how they act.
1. Eric Eggertson says:
  
  Nov 18, 2016 at 3:57 pm
  
  I’m not involved in polling or research, so pardon my ignorance. If I understand the Google poll method, they randomly interrupt someone who is trying to access information by asking them to answer a poll. The incentive to complete the poll is much higher than the incentive to answer thoughtfully.
  If you can separate the mindless answers from the thoughtful ones, I can see the validity of this method. Otherwise, I don’t see how this is any more scientific than the “Person on the Street” interviews on the news, where a random passerby crafts a comment they think the interviewer wants to hear, or that will likely get them on the air.
Tom H. C. Anderson says:

Nov 13, 2016 at 11:03 pm

@ton, that’s part of what this post was about. I posted the findings day before election day (Nov 7), but partly downplayed findings since everyone else was giving Trump 36% or far less chance (original post http://odintext.com/blog/who-are-you-voting-against/ ).
We are not pollsters. The posts purpose was simply to demo various insights you can get via text analytics. However based on how we’ve been able to more accurately predict things like actual customer return behavior (real not stated) and sales better than with any structured survey data (certainly far better than with NPS), I think people need to start re-evaluating and re thinking what they just assume are best practices. Technology is evolving… your research should too.
Sam McNerney says:

Nov 14, 2016 at 3:13 pm

Hi Tom,
I agree that Likert survey questions are not a good predictor of actual behavior. My background is in behavioral science. I was shocked to discover that many behavioral scientists in academia–the folks who emphasize the difference between “Econs” and “Humans”–still rely on a methodology that ignores emotion.

However, I agree with Skeptical Inquirer that your article is using data retroactively to tell a good story. I read your post on November 7th. Although you wrote that the results of your text analysis “may prove problematic for the Clinton camp” you did not seem as convinced then as you do now that your findings did a better job of predicting the outcome of the election compared to the pollsters. The sentence, “Our analysis strongly suggested that Hillary Clinton was in more trouble than any of the other polling data to that point indicated,” which you wrote for this post, seems like textbook hindsight bias.

That said, I’m impressed with your general approach and I just sent an email to your team to request a demo. I work in advertising and I think your software could help us understand how people think about brands, products, and activities (even though I’ve read that you mostly focus on helping clients take inventory of the data they already have). What if instead of asking 1,500 people what they think Hillary Clinton stands for we ask them what Apple or Pepsi stands for? Or, if I have to do research for a client who sells household cleaning products, could I ask 1,500 what they think about when they think about cleaning up around the house?
Frank Graves says:

Nov 15, 2016 at 9:12 pm

I believe the key problems are related to the fact that we had two dangerous conditions. One , turnout was relatively low. Two, the voting preferences of those who stayed home were systematically different than those who showed up. We can still do a reasonable job of modelling known populations such as all eligible voters. In this case we were trying to predict the unknown population of actual voters. The problem isn ‘t so much a population modelling problem as a prediction problem . When turnout is relatively low and there are systematic differences across voters and non voters we are often not going to fare well.The use of likely voter models is highly specious and from our substantial testing over the years we have yet to find a unified theory of voter turnout. Sometimes the correlates of turnout (past voting patterns ) predict the likelihood of voting and sometime they don’t . Predicting the future is very hard , particularly about the future . If we could really predict same day behaviour we would be sitting on a beach clipping coupons rather than having this conversation . The best economists and modelers have blown up trying to predict next day behaviour. We really need to disentangle the problem of modeling known populations from predicting future behaviour. As Hume noted the future need not resemble the past .
I do think that the real problem was the pseudo precision of the prediction models from aggregators that showed risibly inaccurate estimates of supposedly scientific probabilities,. I watched the very impressive NY Times polling model show Clinton’s probability of winning drop from 80 to 8 % in less than three hours. Sorry but this was voodoo science . We shall probably understand this better after all the postmortems but I think this is the essence of the problem
Alan Traverse says:

Nov 17, 2016 at 10:41 pm

Thanks Tom. In my spare time this election season, I interviewed thousands of voters to understand what they were thinking. I was shocked by some of the things I was reading. The constant and extremely loud calls for Clinton to be thrown in jail. The persistent accusations of corruption, lying, and even treason. The near hysteria over her emails and to top it all off; apparently, Clinton had people murdered! I couldn’t believe people thought any of this was true, so I started asking. I ran a couple of surveys, using Likert scales of course, and had many conversations to determine if people really did believe this stuff. I thought it was just hyperbole, but now I’m convinced a very large number do think these things are true.
My work was never meant to predict the winner; I only wanted to understand the voters, so I have not been rushed to publish it. The summary is still in the reporting stages and the qualitative data will take much longer to work through… I’ve heard you have a tool that can help with that. : ) What I think the data shows is a surprising number do believe all or most of the things said about Clinton. Even a few of her supporters had doubts about many of these things. It also showed the discontent among former Sanders supporters who had very strong doubts about Clinton’s integrity. A subset of former Sanders supporters, that said they were voting for Jill Stein, were more often than not in agreement with Trump supporters relative to their beliefs about Clinton.

None of this led me to believe she would lose, I’m as surprised as everyone else, but I am convinced that whatever fears people have about Trump’s plans will not be any more disastrous than if she had won. Based on what I’m seeing, I believe the damage to the system’s (voting, elections, public officials, etc.) integrity would have been severe and based on his rhetoric, perpetuated by Trump for months if not years to come. It would not have surprised me if her victory led to a conspiracy theory industry rivaling the JFK assassination. That could be a complete exaggeration, but I don’t think it is.

I should be ready to start distributing the summary later this week or early next.
What's Really Wrong With Polling? - Evolve says:

Nov 17, 2016 at 11:34 pm

[…] click here to see the original blog at odintext.com […]