2018 U.S. House Predictions

How the odds have changed

The model is continuously updated with new polling data and approval ratings. As new information comes in, the odds may shift towards one party or another. Additionally, as we get closer to the election, there is less uncertainty in the result, since there is less time for a large shift in public opinion. This also changes the odds over time.

The chart below plots the changing odds of each party winning control. The estimates are plotted on a log-odds scale, instead of a traditional probability scale ranging from 0 to 100%. This is to better reflect changes in probability—a shift from a 90% to 95% chance is much more consequential than a change from 50 to 55%.

Distribution of outcomes

The model doesn't produce a single estimate of the number of seats each party will win. Rather, it estimates a probability for each possible arrangement of seats. The model's overall guess is the median of the distribution of seats—where it is just as likely that the Democrats will win more seats as fewer seats.

The table below summarizes the relative chances of each possible distribution of seats. The current distribution and the median outcome are highlighted. 218 seats are needed to control the House.

Dem. Seats	Rep. Seats	Majority	Dem. Gain	Cuml. Likelihood*

*The chances that the Democrats will earn at least this many seats.

National Polling

The basis of the model is the generic congressional ballot, which asks survey respondents which parties' candidate they plan on supporting in their local House race. The model estimates the true support for each party over time, and forecasts this estimate forwards to Election Day. The chart below plots these estimates, along with an accompanying error bar indicating the uncertainty in the estimates. Notice how the uncertainty increases dramatically from today towards Election Day.

Model

The model operates in two broad stages: first, it estimates national voter intent on a weekly basis and forecasts this estimate forwards to Election Day; second, it uses its voter intent estimate to predict the number of seats won by each party, using past elections as a guide. The models for each of these stages are described in more detail below.

Each model is fully Bayesian and is estimated using Hamiltonian Monte Carlo (through the statistical package STAN). Weakly informative priors are used on all parameters. The voter intent model is re-estimated with each new set of polling data. The final results model, in contrast, is fit once on past election data, and the posterior samples are used to provide predictions for the 2018 elections.

Voter Intent Model

Voter intent is estimated from generic ballot polls conducted by various polling firms. These ask survey respondents which parties' candidate they plan on supporting in their local House race. By aggregating poll results we can arrive at an estimate of national voter intent.

Voter intent is estimated on a weekly basis, from the first week of polling through Election Day. It is measured on a log-odds scale and is denoted $\mu_t$, where $t$ is the week in question. Intent is assumed to evolve as an AR(1) process, or mean-reverting random walk: $$ \mu_t \sim \mathrm{t}_{\nu}(\rho\cdot\mu_{t-1}, \sigma^2_w), $$ where $\rho$ is a parameter representing the strength of the mean reversion (restricted to be between 0 and 1), $\sigma^2_w$ is the variance of the random walk, and $\nu$ is the degrees of freedom of the t-distribution. The heavier tails of the t-distribution allow more extreme shifts in public opinion to occur, which may happen after a significant political development. While a more accurate generative model might be a finite mixture model in which public opinion is static most of the time, but able to change dramatically after a major shock, the added complexity of such a model would be unlikely to provide a commensurate increase in accuracy. The priors for the parameters are: $$ \begin{align} \mu_0 &\sim \mathrm{t}_{\nu}(0, 10^2) \\ \nu &\sim \mathrm{Gamma}(2, 0.1) \\ \sigma_w &\sim \mathrm{Cauchy}^+(0, 10) \\ \rho &\sim \mathrm{Beta}(2, 1) \end{align}$$

Polling results are derived from national voter intent. For each poll $i$ we record the sample size, $n_i$, the week it was conducted, $t_i$, the polling firm which conducted it, $f_i$, the number of respondents picking either major party (as opposed to “undecided”), $n^s_i$, and the number picking the Democratic party, $n^d_i$. Both $n^s_i$ and $n^d_i$ are drawn from a binomial distribution, $$ \begin{align} n^s_i &\sim \mathrm{Binom}(n_i, \pi_u) \\ n^d_i &\sim \mathrm{Binom}(n^s_i, \pi_i) \end{align}$$ where $\pi_u$ is the proportion of undecided voters nationally, and $\pi_i$ is the level of support for the Democrats in the given poll. This differs from $\mu$, since polling firms sample differently and have different methodologies. $\pi_i$ is modelled on a log-odds scale as a linear function of national voter intent, polling firm bias, and poll-specific (sampling) error: $$ \mathrm{logit}(\pi_i) = \mu_{t_i} + \alpha_{f_i} + \epsilon_i, $$ where $\alpha_{f_i}$ is the bias of the polling firm $f_i$ and $\epsilon_i$ is the poll-specific error. Both $\alpha_{f_i}$ and $\epsilon_i$ are assumed to be drawn from larger distributions of all possible firm and poll errors: $$ \begin{align} \alpha \sim \mathrm{t}_2(\mu_f, \sigma^2_f) \\ \epsilon \sim \mathrm{t}_2(0, \sigma^2_\epsilon) \end{align}$$ The priors for the hyperparameters are $$ \begin{align} \mu_f &\sim \mathcal{N}(0, 0.02) \\ \sigma_f &\sim \mathrm{t}^+_4(0, 1) \\ \sigma_\epsilon &\sim \mathrm{t}^+_4(0, 1) \end{align}$$ Polling firm errors are not fixed to have a mean of zero since in a given election year, there will be an aggregate polling error by all firms. This is generally within two percentage points, thus the prior or $\mu_f$.

$$ \begin{align} \end{align}$$

Final Results Model

The final results model is essentially a linear model regressing the change in Democratic seats on national voter intent and several other structural and economic covariates. Importantly, measurement error in voter intent is explicitly modelled.

The model is fit using data from the 1974, 1982, 1994, 1998, 2000, 2002, 2004, 2006, 2008, 2010, 2012, and 2014 House elections, which were all the years in which generic ballot polling data were available.

The number of seats won by the Democratic party in election $j$ is denoted $s_j$. Voter intent at Election Day, measured on a log-odds scale, is denoted $\mu$, and all other covariates are included in a vector $X$. Then the model specification is $$ s_j-s_{j-1} \sim \mathcal{N}(\beta_0 + \beta_1\mu + X\vec\beta, \sigma), $$ where $\sigma$ is the error. However, we cannot observe $\mu$ directly—we can only obtain an estimate of it, $\tilde\mu$, with some uncertainty $\sigma^2_\mu$ (both of which are taken from the posterior distribution of the voter intent model, which is very close to normal): $$ \tilde\mu \sim \mathcal{N}(\mu, \sigma^2_\mu). $$ The priors for the parameters are $$ \begin{align} \mu &\sim \mathcal{N}(0, 1) \\ \beta_0 &\sim \mathrm{t}_3(0, 10) \\ \beta_1 &\sim \mathrm{t}_3(0, 10) \\ \vec\beta &\sim \mathrm{t}_3(0, 10) \\ \sigma &\sim \mathrm{Cauchy}^+(0, 20) \end{align}$$

The model forecasts the change in Democratic seats, which is highly dependent on whether or not there is a Democratic president. Consequently, most of the terms in the model are interacted with an indicator representing the incumbent party in the White House (1 if a Democrat, –1 if a Republican). The full set of additional model covariates is described in the following table:

Term	Expected effect this election	Notes
$midterm$	–7 seats	The midterm indicator takes on a value of 1 in midterm elections and 0 otherwise. Regardless of the party in power, Democrats generally underperform in midterm elections due to low turnout.
$pres$	+1 seat	The White House control indicator.
$midterm\times pres$	+7 seats	This is one of the most important terms in the model, since it captures the tendency of midterm elections to swing dramatically against the party of the incumbent president.
$appr\times pres$	+0.3 seats	appr is the incumbent president’s approval rating. A popular president helps down-ballot races.
$earn\times pres$	+0 seats	earn is production and nonsupervisory hourly earnings growth over the previous year. This term and the next capture voters’ perception of the state of the economy, which is important in judging the president’s party.
$unemp\times pres$	+0 seats	unemp is the current unemployment rate.
$before - 218$	+1 seat	before is the number of Democratic seats going into the election, so $before - 218$ is the current Democratic surplus or deficit of seats.
$midterm\times(before - 218)$	+17 seats

Model data and code

CORY McCARTAN

A statistical model for the midterm elections, using polling data, approval ratings, structural factors, and economic indicators.