lederhosen: (Default)
[personal profile] lederhosen
(Not sure whether to use my politics icon or my maths icon for this one.)

Since a couple of y'all have been discussing this: there has been much mention of the alleged fact that 70% of African American voters in California voted for Prop 8. But how reliable is that number?

First off, there are a couple of different types of error that can affect a poll. One is 'sampling error', which comes from random variation in who you select for your poll. To understand sampling error, think of tossing a coin a hundred times - you would expect to get 50 heads and 50 tails, but because every toss is random and unrelated to the last, you could easily end up with 40-60 or 60-40.

You can reduce sampling error by taking a larger sample. As a rule of thumb, if you're sampling people completely at random, independent from one another, your margin of error for something like this is about 1/sqrt(n), where n is the number of people polled. For instance, a lot of political polls work on samples of around a thousand people, which gives them a margin of error of about 1/sqrt(1000), i.e. about 1/30, or ~ 3%. In the end, your sample size is a compromise between accuracy and expense.

[For stats geeks: this is the 95% confidence interval, and that approximation breaks down if the true underlying distribution is a long way off 50-50, but it's good enough for present purposes.]

The main intent of exit polls is to pick the overall outcome of a vote - will California go to Obama, or McCain? Will Proposition 8 pass or fail? The CNN exit poll for California had a sample size of 2240 people, which will give you about a 2% margin of sampling error (actually a bit more, for reasons covered below). Not perfect, but close enough to be useful.

Now, these polls also collect demographic data that can let you examine how the vote broke down among groups of interest - handy if you want to compare men vs. women, old vs young, etc. But it's not the primary purpose of the poll, and so the sample size isn't chosen to guarantee a certain level of accuracy in those 'subpopulation' estimates.

(Why, yes, [livejournal.com profile] lederhosen HAS been working on sample design for subpopulation estimates for most of the last year...)

What this means is that you should not assume the raw number reported there means anything at all, unless you have a margin of error for it.

In this case, the CNN poll notes that African Americans are 10% of the overall sample, i.e. around 224 people. This is not a large number, and it means that the margin of sampling error would be fairly large - around 7%.

But as I noted above, this assumes that the people in your sample are selected completely independently of one another. This isn't quite true for CNN's exit polls. As discussed here, what they actually do is select a random sample of precincts, and then draw their sample from these precincts. This is common practice among survey organisations (my work included), because it just isn't practical to drive over to one precinct, sample a couple of people, and then go on to the next one. 'Clustering' the sample makes it more cost-effective, but it also means that your selections are no longer independent of one another - if you sample Bob, you're also likely to sample other people from his precinct, who may have similar characteristics.

Calculating how that affects the accuracy of your poll is fiddly, and I doubt CNN/EMR have enough data to do a precise calculation on that. (When I need to do this sort of thing - not for political polling, but things like employment estimates etc - I have the luxury of access to 20Gb of individual Census records, which makes life easier, but it's still fairly fiddly.) But at a very rough guess, this effect will probably increase that margin to around 10-15%. (Note that the magic numbers relevant to clustering for the general population may not translate to a given subpopulation, so the impact of clustering could be worse than for the general population.)

Throw in the potential for non-sampling error due to systematic biases (e.g. exit polls don't catch absentee voters, some people refuse to answer them, etc etc) and... well, while it's quite possible that African-Americans *were* more favourable to Prop 8 overall, the exit poll is far from clinching evidence for that.

Date: 2008-11-08 02:11 am (UTC)
From: [identity profile] ruth-lawrence.livejournal.com
Thanks for posting this!

Date: 2008-11-08 02:27 am (UTC)
From: [identity profile] darkrosetiger.livejournal.com
Thanks for posting this for the math challenged--like me. :)

There's another number that may affect the accuracy of this poll: 6.7%

That's the total black population of California. It includes people who are under 18 and people who are or have been in prison, which makes them ineligible to vote.

Date: 2008-11-08 11:41 am (UTC)
From: [identity profile] lederhosen.livejournal.com
Hmm. I'd expect that to translate to less than 6.7% black people among registered voters... in which case, CNN getting 10% black voters implies either some sort of sampling bias, or a very high turnout rate for blacks compared to whites (which might be plausible, this time around).

Date: 2008-11-08 01:45 pm (UTC)
From: [identity profile] ambitious-wench.livejournal.com
Remember, too, your location bias; Blacks tend to congregate in geographical locations. I'd bet that the polls selected were in areas with high percentage of black residents.

Date: 2008-11-09 12:36 am (UTC)
From: [identity profile] lederhosen.livejournal.com
Very likely so - and while that's a good way to increase the raw numbers of blacks sampled, it doesn't mean a representative sample of blacks.

Date: 2008-11-09 12:38 am (UTC)
From: [identity profile] ambitious-wench.livejournal.com
I like the way you challenge my math-ineptitude. Thanks.

Number pushing

Date: 2008-11-08 03:47 am (UTC)
From: [identity profile] mothwentbad.livejournal.com
Well, if we'd left it up to white people, then wouldn't McCain-Palin be in office right now? Can we call it even?

Date: 2008-11-08 01:03 pm (UTC)
From: [identity profile] ambitious-wench.livejournal.com
Art, I've sent links to this article to Pam Spalding of Pam's House Blend, and in a related thread on Rachel Maddow's site. They are both operating under the false assumption that 70% of black voters voted for the initiative.

Date: 2008-11-08 01:52 pm (UTC)
From: [identity profile] nefaria.livejournal.com
Umm, Art hasn't proven that the number is false, he's just proven that it's entirely possible that it's false, because the margin of error is more than usual.

Date: 2008-11-08 01:56 pm (UTC)
From: [identity profile] ambitious-wench.livejournal.com
You're right, of course. I realized that after I posted. Not to worry, I stressed in both emails that using exit polls is risky at best because of the margins of error inherent in the sampling.

Thanks for catching my mistake and letting me correct myself.

Date: 2008-11-09 12:34 am (UTC)
From: [identity profile] lederhosen.livejournal.com
This is correct. It would not surprise me to learn that black Californians genuinely did have a higher percentage of support for Prop 8 than the state average*, but this poll is not reliable evidence for that.

*I wasn't there to see it, but my understanding is that the Yes campaign paid much more attention to racial minorities than the No campaign.

Date: 2008-11-08 02:27 pm (UTC)
From: [identity profile] ambitious-wench.livejournal.com
Art, another perspective: It touches on lots of the points you made here:


The author is a black bisexual poly in a monogamous hetero marriage.


lederhosen: (Default)

July 2017

2324252627 2829

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Oct. 22nd, 2017 08:55 pm
Powered by Dreamwidth Studios