Prediction of the results of events that have yet to occur is one of the cornerstones of science. Elections present opportunities to utilize polling and other data to develop forecasts that can then be assessed by comparison with election results. Such election forecasts have become quite visible in recent years with efforts such as those from statistician Nate Silver and his colleagues at FiveThirtyEight. One of the most important aspects of forecasts such as those at FiveThirtyEight is that they explicitly present distributions of likely outcomes with estimated probabilities rather than single predicted results.

Elections for the House of Representatives are particularly useful because they involve a large number of separate but somewhat correlated elections across 435 congressional districts across the United States. FiveThirtyEight produced three forecasts for the recent congressional elections: one based on polls only (Lite); one based on polls plus fundraising, district history, and historical trends (Classic); and one that added ratings from experts to the Classic forecast (Deluxe). I begin by examining how these forecasts compare to one another using the final forecasts before results from the elections were available.

The fundamental basis for these forecasts is the estimation of the probabilities for the percentages of votes for each candidate. For the polls only (Lite) forecast, available polls are used as primary data, combining various polls using corrections and weighting factors based on the polling methodology used and the historical accuracy of different pollsters. The distribution for the percentages of votes from the Lite forecast for Democratic candidates is shown in Figure 1. Note that some races had candidates running unopposed, one congressional race in California paired two Republican candidates, one race in Washington paired two Democratic candidates, races in Louisiana had numerous candidates from various parties, and many races included third-party candidates.

In almost exactly half of the 435 races, the Democratic candidate was forecast to receive 50% or more of the vote, with an additional 106 forecast to receive between 40 and 50%.

## Comparison of different FiveThirtyEight congressional forecasts

I now consider the more elaborate forecasts from FiveThirtyEight. The predicted percentages for Democratic candidates for the Lite and Classic forecasts are compared in Figure 2.

The average difference between the Lite and Classic forecasts is 0.52% (with the Lite forecast on average higher) with a standard deviation of 3.16%, and the correlation coefficient between the two forecasts is 0.9901.

The Deluxe forecast is only slightly different from the Classic forecast, with an average difference between the Classic and Deluxe forecasts of 0.16% (with the Classic forecast on average higher) with a standard deviation of 0.59%. The correlation coefficient between these two forecasts is 0.9996.

## Comparisons between forecast percentages and election results

I now compare the percentages from the FiveThirtyEight forecasts with the results from the election. These results were obtained from politico.com on 15 November. Although these results were not certified at the time of this writing, the percentages are very unlikely to change enough to affect the analysis below. I will focus on the FiveThirtyEight Deluxe forecast.

The actual percentages from the election are compared with those from the Deluxe forecast in Figure 3.

Overall, the correlation coefficient between the actual results and those from the forecast is 0.9874.

The differences between the actual percentages and those from the Deluxe forecast are shown in Figure 4, and a histogram of these differences is shown in Figure 5.

The average difference between the actual percentage and that from the Deluxe forecast is -0.63% (Deluxe forecast higher) with a standard deviation of 3.16%.

How do these results compare with those for the other two forecasts? For the Lite forecast, the correlation coefficient with the election results is 0.9788, and the average difference is -1.31% with a standard deviation of 4.48%. Similarly, for the Classic forecast, the correlation coefficient with the election results is 0.9873, and the average difference is -0.79% with a standard deviation of 3.14%. Thus, the Lite forecast performed substantially worse than the Deluxe and Classic forecasts. The Deluxe forecast performed very slightly better than the Classic forecast.

## Success in predicting election winners

Although the success in estimating voting percentages is impressive, elections are decided by which candidate receives the most votes. Correctly predicting that one candidate is likely to receive 75% of the vote compared with 65% is of no importance because this candidate will win the election in either case. Thus, the accuracy and precision of predictions in the vicinity of 50% (in a two-person race) are of critical importance. If these predictions were highly accurate and precise, then predicting elections would be straightforward by simply determining which candidate was predicted to get a higher percentage of votes.

Winners have been declared in 429 out of 435 congressional races as of this writing, with the remaining races too close to call. For the purpose of this analysis, I am assuming that the current vote leader will eventually be declared the winner in the remaining races. Overall, the candidate predicted in the Deluxe forecast to receive the most votes won in 425 races, corresponding to 97.7%.

However, predictions from polls and other data are imprecise, with uncertainties of several percentage points or more. Forecasts such as those performed by FiveThirtyEight deal with these uncertainties by performing thousands of election simulations in which each candidate’s percentage is allowed to vary from its predicted value. These variations can have multiple components: independent variations that affect only one race; broader variations that can affect multiple races (reflecting, for example, overall national trends at the time of the election) or regional effects; and variations that reflect other factors such as incumbency that can influence polling accuracy. Once thousands of such simulations are performed, the probability that any given candidate will win can be estimated by determining the fraction of simulations in which she or he received a higher percentage of votes than her or his opponents. For example, if a candidate is predicted in the baseline prediction to receive 70% of the votes, then the uncertainties will accumulate to cause this candidate to lose in no or very few simulations, and this candidate can be forecast to win with high probability. On the other hand, if a candidate is predicted to receive 50% of the vote in a two-person race, then this candidate might win in half of the similations and lose in the others, yielding a probability of winning of 50%. Given the uncertainty, a forecast may predict that each of the candidates in a certain number of races has a probability of winning of 50%. The forecast is deemed to be accurate if half of these candidates win their races, even if the forecast is silent about which half.

First, consider elections for which one candidate was strongly favored to win in the Deluxe forecast. There were 192 races for which the probability of the Democratic candidate winning was between 0 and 25%. The average probability across this pool was 3.6%. Among these, there were three races where the Democratic candidate won, corresponding to 1.6%, lower than but in reasonable agreement with the expectation. Similarly, there were 208 races for which the probability of the Democratic candidate winning was between 75 and 100%. The average probability across this pool was 98.9%. Among these, the Democratic candidate won in all races. Thus, elections produced somewhat fewer major upsets than were predicted by the forecast, although this observation would have been affected by changes in only a few races.

Now, let us consider the 35 races for which the probability of the Democratic candidate winning was between 25 and 75%. Of these, there were 15 races for which the probability was between 25 and 50%. The average probability across this window was 36.7%. Among these, the Democratic candidate won in five, or 33.3%, of them. Similarly, there were 20 races for which the probability was between 50 and 75%. The average probability across this window was 61.4%. Among these, the Democratic candidate won in 16, or 80%, of them.

An alternative way of displaying the results is as follows. Races are sorted based on the Deluxe forecast probability of a Democratic winner from lowest to highest. Starting with a window from race 1 to race *n*(with *n*empirically set at 15), two parameters are calculated. The first is the average probability calculated over all races in the window. Second, the fraction of Democratic winners across races in the window is calculated by dividing the number of wins by the window size. The window is then moved to races 2 to *n*+1, and the calculations are repeated. This is repeated as the window is moved across the entire set of races. These results are shown in Figure 6.

If the results of the election perfectly matched the forecast probabilities, this plot would be a straight line, although some variation is anticipated because of the probabilistic nature of the forecast. The curve does approximately pass through the center of the plot, reflecting that the forecast did quite well in predicting races with probabilities near 50%. The slight S-shape of the curve is due to the lower number of major upsets that occurred than would have been expected from those probabilities.

## Conclusions

The FiveThirtyEight forecasts of the 2018 congressional elections were quite accurate in predicting the percentages of votes received by the candidates and in estimating the probabilities for particular election outcomes. The inclusion of data in addition to weighted and corrected polling data improved the accuracy of the predictions. As with all scientific analyses, it is important to keep in mind both the core predictions and the associated uncertainties in these predictions.