Press "Enter" to skip to content

Polls Differ, and That’s Okay

People don’t trust polling, even though polling is impressively accurate and models improve year after year.

If polls are so “scientific” then why do they differ?

Polling is considered a research methodology in the social sciences and economics, but with a lot of caveats. Like any research method dealing with human experiences, it’s difficult to replicate polling results. Polling involves a lot of choices on the part of the pollster and the data scientists generating results.

Let’s assume five research firms are hired to analyze likely vehicle purchases in the coming year. All five will have slightly different results.

The first batch of choice include what to ask, how to phrase questions, and to whom the questions will be posed. Each company will develop a slightly different survey.

Having seen such surveys, I know some of the questions:

  • Are you currently consider a vehicle purchase?
  • Do you plan to purchase within the next three months, six months, or year?
  • How old is your current vehicle?
  • Did you purchase the vehicle new or used?
  • How many vehicles have your purchased or leased?
  • Has your household size changed in the last year?
  • And many more…

By obtaining zip code information, you can estimate average educational level, household income, and even typical vehicle ownership in a region. These data will be used for analysis, too.

Imagine each firm receives 1000 responses.

It’s time to weight the results. If this is a national survey, the data experts will balance the data to reflect national demographics. For a state or region, weightings will differ, but the same mathematical processes are used.

Imagine 22 percent of the responses were from Black car buyers. Since we know 13.4 percent of adults in the United States are Black, the statisticians need to adjust the results of the eventual analysis to match the actual population, not the surveyed population. If a survey over-samples or under-samples in the extreme, it might be necessary to conduct more surveys.

The modeling to predict purchases requires assumptions, based on past data. This leads to more choices:

  • How many people answering “yes” are actually going to buy a new vehicle?
    More people say they intend to buy a vehicle than actually do in a given quarter.
  • What is the average age of vehicles on the road?
    If someone says they are on the market and the current vehicle is near or over that average age, then we might assume that shopper is more serious.
  • Is the zip code performing well economically?
    People buy cars in clusters, likely unconsciously, so when neighbors are buying cars, that changes calculations.
  • Has the household size changed?
    Adding or subtracting people from a household leads to an increased likelihood of a vehicle purchase. Ask any new parent or empty-nester.

There might be 20 survey questions and a purchased demographic database with another 50 or more data points (far more data than within the survey, in my experience). Now, with 70 or more data points, every firm is going to weight the variables in a proprietary manner. These choices are informed by years of experience making models.

My students assume political polling is more complex, but the models I’ve seen firsthand suggest consumer surveys and political polling are equally impressive.

When polling voters, you’re asking variations of the car buyer questions.

  • Do you intend to vote in the upcoming election?
  • Have you voted in the last two elections?
  • Have you donated to any candidates this election cycle?
  • Have you volunteered for any campaign in the past?
  • And so on…

Every pollster will ask slightly different questions. They will also obtain voter registration data, demographic data, and some additional data purchased from third-parties. I know at least two firms that obtain Google search data for the purpose of predicting turn out. If people are searching for polling places, those households have one or more likely voters.

The data obtained and analyzed might seem absurd, but every bit of data reveals something about the potential voter. Race, gender, religious, income, education level, and much more is considered when constructing a model for poll results.

Once the data are consolidated, the statisticians have to generate a model of likely turnout and likely results.

I know a polling firm that also includes weather forecasts in its final turnout predictions. Everything matters. A snowstorm in major cities reduces turnout significantly on Election Day.

Though much of this work is proprietary and carefully guarded, the results of polls can be compared. That’s what FiveThirtyEight does, without knowing the precise methods of the polls they use to create an aggregate analysis.

A polling firm that is within 2 percent of the outcome year after year is going to be weighted more in the FiveThirtyEight aggregate model than a firm that is erratic from year to year. Also, if a pollster is incredibly consistent, but over or under by a certain amount, that’s still good for modeling. A polling firm over-modeling Democratic voter turnout by 1.5 percent year after year? That’s an easy adjustment in an aggregate analysis. (Of course, you’d also hope that firm fixes its models.)

When data are off by 1.5 percent year after year, that polling model is highly precise. It’s just not accurate. By adjusting the error, you can model both precision and accuracy.

No two pollsters are going to use the same assumptions about turnout. What is a likely voter? Honestly, it’s something of an educated guess based on historical data.

One reason Democrats poll better is there are more registered Democrats than any other party. But, they also vote at a lower rate. Therefore, models must forecast who will vote based on the current environment.

A final thought. Things happen.

We could model new car purchases, and companies did, but then the COVID-19 pandemic cratered car sales. We cannot model the unexpected. That seems obvious, but the unexpected happens in politics and in economics.

One big news story could change an election. One major weather event could reduce turnout.

Modeling is useful. It helps businesses, governments, and political candidates. Still, we should never put complete faith in any model to be perfect.

 


Discover more from Almost Classical

Subscribe to get the latest posts sent to your email.