Prediction Overconfidence
As I write this, it is
5 a.m. November 9, 2016. I got up to see
if, perhaps, Clinton had pulled out an unlikely victory. She had not. Trump
will be president.
The predictors did not
do a good job. Here’s what I got when Googling the Huffington Post presidential
prediction a few minutes ago (i.e. AFTER the election):
The Huffington
modelers were outliers, but let’s look at what the major modeling groups said
the day before the election[1]
New York Times: 84%
chance Democrats will win the presidency
FiveThiryEight: 64%
Hufffington Post: 99%
PW: 89%
PEC: >99%
DK: 87%
Cook: Lean Dem
Rothenberg and
Gonzales: Lean Dem
Sabato: Likely Dem
So where did they go
wrong? Certainly there are difficulties in polling now, with nonresponse rates
being very high. Pew has done a series
of studies using the same methodology over the years, so we can compare
response rates[2]:
This makes it a
challenge to adjust for these nonresponse rates, which are not random. I’d
worried earlier that there might be some decent sized pocket of Trump voters
who weren’t admitting they were Trump voters, because they thought that was a
socially unpopular thing to do. That would technically be a bias, and the bias would be similar (correlated) across all states and polls.
And the results are
likely to be within the margin of error of the individual polls. But, still,
let’s not gloss over the fact that, in the end, there was a failure.
Oddly enough, this
failure seems to me to be similar to the error in financial modeling that was
one of the causes of the faulty risk assessments prior to the financial crisis
in 2008, that led us into years of recession. That’s a failure to accurately
measure the amount of intercorrelation.
I see an inkling of
this in statistician and political scientist Andrew Gelman’s blog post election
night[3]:
Election forecasting updating error: We
ignored correlations in some of our data, thus producing illusory precision in
our inferences
The election outcome is a
surprise in that it contradicts two pieces of information: Pre-election polls
and early-voting tallies. We knew that each of these indicators could be flawed
(polls because of differential nonresponse; early-voting tallies because of
extrapolation errors), but when the two pieces of evidence came to the same
conclusion, they gave us a false feeling of near-certainty.
In retrospect, a key
mistake in the forecast updating that Kremp and I
did, was that we ignored the correlation in the partial information
from early-voting tallies. Our model had correlations between state-level
forecasting errors (but maybe the corrs we used were still too low, hence
giving us illusory precision in our national estimates), but we did not include
any correlations at all in the errors from the early-voting estimates. That’s
why our probability forecasts were, wrongly, so close to 100%.
Put simply, if there
is either a late surge for Trump in opinion, or there was a hidden batch of
Trump supporters, or Trump supporters were more likely to show up to vote than
expected by the models, these errors would not be random, they would be
correlated. There would be more Trump votes across nearly ALL states. Similarly, if Hillary supporters were less
likely to show up to vote than expected, this would be likely to affect nearly
ALL states, not occur randomly.
Note Gelman is aware
of this problem, but doesn’t feel that he adjusted completely enough for it.
So how is this related
to the financial crisis? Recall those mortgages that were packaged together,
each with a certain probability of failing.
But each mortgage had, say, a 2% chance of failing, then a bundle of
1000 mortgages would have about 20 failing, with a 95% chance that the number
will be between 12 and 28. But that’s if the mortgages failures were
independent. They aren’t. Like Gelman in his note above, it was known
that the mortgages weren’t independent and that a correlation needed to be
estimated. Felix Salmon, in his article
“Recipe for Disaster: The Formula That Killed Wall Street”[4]
notes that this estimation of the correlation by David X. Li using a Gaussian
copula function
“looked
like an unambiguously positive breakthrough, a piece of financial technology
tha allowed hugely complex risks to be modeled with more easy and accuracy than
ever before… His method was adopted by everybody from bond investors and Wall
Street banks to ratings agencies and regulators. And it became so deeply
entrenched – and was making people so much money – that warnings about its
limitations were largely ignored.”
“Using
some relatively simple math—by Wall Street standards, anyway—Li came up with an
ingenious way to model default correlation without even looking at historical
default data. Instead, he used market data about the prices of instruments
known as credit default swaps…. When the price of a credit default swap
goes up, that indicates that default risk has risen. Li's breakthrough was that
instead of waiting to assemble enough historical data about actual defaults,
which are rare in the real world, he used historical prices from the CDS
market.”
But there’s a problem,
as Salmon notes:
“The
damage was foreseeable and, in fact, foreseen. In 1998, before Li had even
invented his copula function, Paul Wilmott wrote
that "the correlations between financial quantities are notoriously
unstable." Wilmott, a quantitative-finance consultant and lecturer, argued
that no theory should be built on such unpredictable parameters. And he wasn't
alone. During the boom years, everybody could reel off reasons why the Gaussian
copula function wasn't perfect. Li's approach made no allowance for
unpredictability: It assumed that correlation was a constant rather than
something mercurial. Investment banks would regularly phone Stanford's Duffie
and ask him to come in and talk to them about exactly what Li's copula was.
Every time, he would warn them that it was not suitable for use in risk
management or valuation.
“In hindsight, ignoring those warnings looks
foolhardy. But at the time, it was easy. Banks dismissed them, partly because
the managers empowered to apply the brakes didn't understand the arguments
between various arms of the quant universe. Besides, they were making too much
money to stop.”
So, in both the
election forecasting and in the financial forecasting of those tranched
mortgage securities we have a problem in not accurately understanding the
correlations (and the stability of the correlations) between events. There are
a lot of differences, of course, but still those high level similarities.
And there is the human
tendency to overconfidence in predictions, which has been amply demonstrated
many times[5],
including even in a survey of Messy Matters blog readers (who tend to be
professional statisticians) taking part in a survey called “Are You
Overconfident”![6]
“The
bad news is that you’re terrible at making 90% confidence intervals. For
example, not a single person had all 10 of their intervals contain the true
answer, which, if everyone were perfectly calibrated, should’ve happened by
chance to 35% of you. Getting less than 6 good intervals should, statistically,
not have happened to anyone. How many actually had 5 or fewer good intervals?
76% of you.”
So, in both the
financial collapse and in the 2016 election predictions we have an inability to
accurately understand the correlation between events, combined with the bias
toward overconfidence that seems to be a persistently human trait.
Regardless of our
posthac understanding, we still had a deep recession after the financial
collapse and we will still have Donald Trump as president. So, there may be
understanding, but there will also be pain.
Can we not have a gain in learning without pain?
I’m trying not to think
about the more complex situation of climate models.
Fun Fact: Here's a surprising number. I downloaded the polls data from FiveThirtyEight (polls only forecast), and only looked at polls since Sept 1. The total sample size in these polls? 3,155,370 (these include polls done in states, mostly in swing states). That's a truly staggering number. So, while individual polls have a sampling error margin of error, the error in polling as a whole is due to nonsampling errors (most commonly summarized under the term "biases".
[1]
Josh Katz, The Upshot, New York Times, 2016 Election Forecast: Who Will Be
President, updated Monday Nov 7, 2016 6:58a.m.
[2] http://www.people-press.org/2012/05/15/assessing-the-representativeness-of-public-opinion-surveys/
accessed November 9, 2016. “Assessing
the Representativeness of Public Opinion Surveys“ (May 16, 2012 report)
[4]
Felix Salmon “Recipe for Disaster: The Formula That Killed Wall Street” Wired, February 23, 2009 https://www.wired.com/2009/02/wp-quant/
, accessed November 9, 2016.
[5] Mannes, A. and Moore, D. (2013), I know I'm
right! A behavioural view of overconfidence. Significance, 10: 10–14. doi:10.1111/j.1740-9713.2013.00674.x
[6]
Daniel Reeves “Are You Overconfident?” Messy Matters (blog) Sunday, February
2010 http://messymatters.com/calibration/
and results “Yes, You Are (Maybe) Overconfident”, Wednesday, March 31, 2010. http://messymatters.com/calibration-results/
(accessed November 9, 2016)
No comments:
Post a Comment