Wednesday, October 28, 2015

How to fabricate a presidential race





Hold the presses!  Ben Carson is edging ahead of Donald Trump!

Or so the New York Times informs us, right in the headline about its co-sponsored survey of Republican voters.  A little further down in the story, the Times confides that the size of the difference in proportions of respondents who prefer Carson or Trump for president is “within the margin of sampling error.” Having gotten that irksome ritual phrase out of the way, the Times goes on to analyze the race as if Carson really was in the lead.

Unfortunately for the Times, the “margin of error” is exactly that.  The race is too close to call.  In the survey, the difference between the proportions supporting Carson and Trump is so small that among all Republicans it might well be zero.   Had the Times and CBS News surveyed voters the day before, they might easily have found that Trump on top.

Sometimes polls are meaningful even when we can’t generalize from them.  For example, suppose that we survey all Times reporters and find that 51% of them are Republicans. We might not be able to deduce that most reporters in the US are Republicans, but at least we know that to be true of one of the country’s most influential newspapers.  But in the case at hand, the sample has no meaning other than what we can infer about Republicans in general.  The fact that Carson is slightly ahead of Trump in the sample has no significance.  It wouldn’t merit a headline in even the Western Succotash News-Free Press.

Journalists sometimes dismiss the margin of error because they think that it refers to mistakes by the surveyors, such as coding the wrong answer.  Surely, they reason, if the surveyors are careful, the “margin of error” will be just an empty phrase.

In reality, the margin of error refers to all chance events that affect the outcome of the survey.  Most of them are unavoidable. For example, the sample of voters surveyed is virtually never a perfect mirror of all Republicans.  Even if it is, the responses are still subject to randomness.  An indifferent voter may state a preference for Carson today and Trump tomorrow. 

Before declaring that Carson (or Trump) is in the lead, we must calculate the size of random factors.  If we are willing to live with only a 5% chance of being wrong about the leader, and the survey responses vary so much from day to day that there’s a 10% chance of being wrong, then we should confess that we don’t really know who’s in the lead.  That won’t get us on page one, but it happens to be the truth.

 Leon Taylor, tayloralmaty@gmail.com

Notes

These points also apply to the survey questions about voter attitudes, about which the Times speculates in great detail but without a single reference to a confidence interval.  Given the amount of dough that the Times and CBS News are spending on the survey, they would be well within their rights to demand that the pollsters provide a 95% confidence interval for each question. 

Such an interval means this: If we take 100 random samples of Republicans, then the interval will include the percentages backing a given position in at least 95 of the samples.  For example, suppose that the confidence interval for the difference in shares of respondents backing Carson and Trump is [-2%, 2%].  Then we may expect that in 95 or more of 100 samples, the difference in shares will be between -2% (Trump leads slightly) and 2% (Carson leads slightly).  Since the confidence interval includes zero, we cannot rule out the possibility that among all Republicans the race is a dead heat.

Now suppose instead that the 95% confidence interval is [2%, 4%].  Then, in at least 95 of the 100 samples, Carson would be in the lead by 2% to 4%. Since chances are less than 5% that Trump leads or that the two candidates are tied, we could safely conclude that Carson is the leader among all Republicans.

  
 References 

Jonathan Martin and Dalia Sussman.  Poll watch: Ben Carson edges ahead nationally in Times/CBS News poll.  New York Times.  October 28, 2015.