Dear Analyst

Dear Analyst


Dear Analyst #36: What The Economist’s model for the 2020 presidential election can teach us about forecasting

July 13, 2020

On a recent episode of The Intelligence, The data editor at The Economist spoke about a U.S. presidential election forecast their publication is working on. I looked more into their model and discuss some of the features and parameters of their model and what makes their forecast unique. Some of the techniques used in The Economist's model can be used with your own forecasting use cases. To see a summary of The Economist's model, see this page. Learn more about how the model works on this page.

Source: The Economist

Key takeaways and a caveat

The model utilizes machine learning and multiple data sources and it's easy to get caught up in the details. Here are the key takeaways as described by Dan Rosenhack, the data editor at The Economist:

* Machine learning is used to create equations to predict the 2020 presidential outcome* Early polls are not as reliable early on in the election cycle* Partisan non-response bias can result in a supporter being more likely or unlikely to respond to a pollster when there is extremely good or bad news about that supporter's party or candidate

A caveat: The Economist's model and the various forecasting techniques they use are definitely outside of my knowledge and skillset. Most of this episode is me learning more about the model and interpreting some of the results. You don't have to be a statistics programmer or data science professional to appreciate what the data team has done at The Economist. If you are working with data in any capacity, pushing yourself to learn about subjects that push your comfort zone will only make you more knowledgable about the data analysis process.

Fundamentals vs. early polling

One key finding from the model is that polls conducted in the first half of the year during the election cycle are a pretty weak predictor of results. On the other hand, fundamental measures like the president's approval rating, GDP growth, and whether there is an incumbent running for re-election are much better predictors. This chart shows the difference between poll results and fundamentals for predicting the outcome in 1992:

Source: The Economist

The model primarily relies on these fundamental indicators, but over time the polls become a better indicator for predicting the outcome. In the last week leading up the election in November, more weight is applied to the polls than the fundamentals.

This visualization below shows that early polls tend to overestimate a party's share of the vote (in this case the Democratic share) compared to fundamental indicators. As you get closer to election day, however, the polls start to become a better predictor:

Source: The Economist

Overfitting data

One downside The Economist points out with other models that try to forecast the presidential election is that equations are created that overfit to historical data points. Think about it: if you tried to create an equation to predict who would win the NBA championship in 2020 based on 1990s data, you may create an equation that leans heavily to the Bulls. Unfortunately, Michael Jordan isn't playing anymore and the 2020 NBA season is now being played in a bubble in Orlando.

Had to mention Jordan somewhere in this post :)