Why not subscribe?

Wednesday, November 09, 2016

What went wrong with the presidential polling pundits?

Prediction Overconfidence

As I write this, it is 5 a.m. November 9, 2016.  I got up to see if, perhaps, Clinton had pulled out an unlikely victory. She had not. Trump will be president.

The predictors did not do a good job. Here’s what I got when Googling the Huffington Post presidential prediction a few minutes ago (i.e. AFTER the election):

The Huffington modelers were outliers, but let’s look at what the major modeling groups said the day before the election[1]

New York Times: 84% chance Democrats will win the presidency
FiveThiryEight: 64%
Hufffington Post: 99%
PW: 89%
PEC: >99%
DK: 87%
Cook: Lean Dem
Rothenberg and Gonzales: Lean Dem
Sabato: Likely Dem

So where did they go wrong? Certainly there are difficulties in polling now, with nonresponse rates being very high.  Pew has done a series of studies using the same methodology over the years, so we can compare response rates[2]:

This makes it a challenge to adjust for these nonresponse rates, which are not random. I’d worried earlier that there might be some decent sized pocket of Trump voters who weren’t admitting they were Trump voters, because they thought that was a socially unpopular thing to do. That would technically be a bias, and the bias would be similar (correlated) across all states and polls.

And the results are likely to be within the margin of error of the individual polls. But, still, let’s not gloss over the fact that, in the end, there was a failure.

Oddly enough, this failure seems to me to be similar to the error in financial modeling that was one of the causes of the faulty risk assessments prior to the financial crisis in 2008, that led us into years of recession. That’s a failure to accurately measure the amount of intercorrelation.

I see an inkling of this in statistician and political scientist Andrew Gelman’s blog post election night[3]:

Election forecasting updating error: We ignored correlations in some of our data, thus producing illusory precision in our inferences

Posted by Andrew on
The election outcome is a surprise in that it contradicts two pieces of information: Pre-election polls and early-voting tallies. We knew that each of these indicators could be flawed (polls because of differential nonresponse; early-voting tallies because of extrapolation errors), but when the two pieces of evidence came to the same conclusion, they gave us a false feeling of near-certainty.
In retrospect, a key mistake in the forecast updating that Kremp and I did, was that we ignored the correlation in the partial information from early-voting tallies. Our model had correlations between state-level forecasting errors (but maybe the corrs we used were still too low, hence giving us illusory precision in our national estimates), but we did not include any correlations at all in the errors from the early-voting estimates. That’s why our probability forecasts were, wrongly, so close to 100%.
Put simply, if there is either a late surge for Trump in opinion, or there was a hidden batch of Trump supporters, or Trump supporters were more likely to show up to vote than expected by the models, these errors would not be random, they would be correlated. There would be more Trump votes across nearly ALL states.  Similarly, if Hillary supporters were less likely to show up to vote than expected, this would be likely to affect nearly ALL states, not occur randomly.

Note Gelman is aware of this problem, but doesn’t feel that he adjusted completely enough for it.

So how is this related to the financial crisis? Recall those mortgages that were packaged together, each with a certain probability of failing.  But each mortgage had, say, a 2% chance of failing, then a bundle of 1000 mortgages would have about 20 failing, with a 95% chance that the number will be between 12 and 28. But that’s if the mortgages failures were independent.  They aren’t.  Like Gelman in his note above, it was known that the mortgages weren’t independent and that a correlation needed to be estimated.  Felix Salmon, in his article “Recipe for Disaster: The Formula That Killed Wall Street”[4] notes that this estimation of the correlation by David X. Li using a Gaussian copula function
“looked like an unambiguously positive breakthrough, a piece of financial technology tha allowed hugely complex risks to be modeled with more easy and accuracy than ever before… His method was adopted by everybody from bond investors and Wall Street banks to ratings agencies and regulators. And it became so deeply entrenched – and was making people so much money – that warnings about its limitations were largely ignored.”
“Using some relatively simple math—by Wall Street standards, anyway—Li came up with an ingenious way to model default correlation without even looking at historical default data. Instead, he used market data about the prices of instruments known as credit default swaps…. When the price of a credit default swap goes up, that indicates that default risk has risen. Li's breakthrough was that instead of waiting to assemble enough historical data about actual defaults, which are rare in the real world, he used historical prices from the CDS market.”
But there’s a problem, as Salmon notes:
“The damage was foreseeable and, in fact, foreseen. In 1998, before Li had even invented his copula function, Paul Wilmott wrote that "the correlations between financial quantities are notoriously unstable." Wilmott, a quantitative-finance consultant and lecturer, argued that no theory should be built on such unpredictable parameters. And he wasn't alone. During the boom years, everybody could reel off reasons why the Gaussian copula function wasn't perfect. Li's approach made no allowance for unpredictability: It assumed that correlation was a constant rather than something mercurial. Investment banks would regularly phone Stanford's Duffie and ask him to come in and talk to them about exactly what Li's copula was. Every time, he would warn them that it was not suitable for use in risk management or valuation.

“In hindsight, ignoring those warnings looks foolhardy. But at the time, it was easy. Banks dismissed them, partly because the managers empowered to apply the brakes didn't understand the arguments between various arms of the quant universe. Besides, they were making too much money to stop.”
So, in both the election forecasting and in the financial forecasting of those tranched mortgage securities we have a problem in not accurately understanding the correlations (and the stability of the correlations) between events. There are a lot of differences, of course, but still those high level similarities.

And there is the human tendency to overconfidence in predictions, which has been amply demonstrated many times[5], including even in a survey of Messy Matters blog readers (who tend to be professional statisticians) taking part in a survey called “Are You Overconfident”![6]

“The bad news is that you’re terrible at making 90% confidence intervals. For example, not a single person had all 10 of their intervals contain the true answer, which, if everyone were perfectly calibrated, should’ve happened by chance to 35% of you. Getting less than 6 good intervals should, statistically, not have happened to anyone. How many actually had 5 or fewer good intervals? 76% of you.”

So, in both the financial collapse and in the 2016 election predictions we have an inability to accurately understand the correlation between events, combined with the bias toward overconfidence that seems to be a persistently human trait. 

Regardless of our posthac understanding, we still had a deep recession after the financial collapse and we will still have Donald Trump as president. So, there may be understanding, but there will also be pain.  Can we not have a gain in learning without pain?

I’m trying not to think about the more complex situation of climate models.

Fun Fact: Here's a surprising number. I downloaded the polls data from FiveThirtyEight (polls only forecast), and only looked at polls since Sept 1. The total sample size in these polls? 3,155,370 (these include polls done in states, mostly in swing states). That's a truly staggering number. So, while individual polls have a sampling error margin of error, the error in polling as a whole is due to nonsampling errors (most commonly summarized under the term "biases".

[1] Josh Katz, The Upshot, New York Times, 2016 Election Forecast: Who Will Be President, updated Monday Nov 7, 2016 6:58a.m.
[2] http://www.people-press.org/2012/05/15/assessing-the-representativeness-of-public-opinion-surveys/  accessed November 9, 2016. “Assessing the Representativeness of Public Opinion Surveys“ (May 16, 2012 report)
[4] Felix Salmon “Recipe for Disaster: The Formula That Killed Wall Street” Wired, February 23, 2009 https://www.wired.com/2009/02/wp-quant/ , accessed November 9, 2016.
[5] Mannes, A. and Moore, D. (2013), I know I'm right! A behavioural view of overconfidence. Significance, 10: 10–14. doi:10.1111/j.1740-9713.2013.00674.x
[6] Daniel Reeves “Are You Overconfident?” Messy Matters (blog) Sunday, February 2010 http://messymatters.com/calibration/ and results “Yes, You Are (Maybe) Overconfident”, Wednesday, March 31, 2010. http://messymatters.com/calibration-results/ (accessed November 9, 2016)