Last night I had a few hours to kill at a cafe. The cafe had decent coffee, and slow wifi, so I thought it would be a good chance to mess around with the preliminary results from my race perceptions survey! I am intending to dress this up quite a bit when I get more respondents (the survey will remain open for now).
For now, here is a first "quick and dirty" look at how our perceptions of racial composition track with reality.
Map of Respondents
Here is the updated map, with ~380 people responding across the country. There is major clustering around certain major cities, owing to my sending the survey to my Facebook friends (largely in WA), and two region-specific subreddits (r/seattle and r/sanfrancisco). The sample of respondents is ~80% white, 55% male, and 90% are within the 18-25 or 25-35 age groups.
|In red: density of zip codes across the US. Survey respondents are marked in blue.|
Census Maps of Race in the US
These are some simple (UGLY!) maps of the racial composition of the US by ZIP code. Nothing surprising here... Large percentages of asians are only typically found in major cities, blacks are a larger contribution to the south/eastern US population, and the north is very white.
Perception versus Reality
And now the reason you're probably reading this! Again, these figures do not carry my highest mark of quality, but are interesting at least. First, here is the raw "guesses verses actual Census data" figure for the 4 primary races. I have renormalized people's estimates to add up to 100%. In general, people have the right idea. The finer details (below) tell an interesting story, though...
|Perceptions versus reality. Broad agreement.|
Note colors do not correctly match with below...
If you calculate the "geometric distance" of the guesses from the actual numbers (basically adding the differences in quadrature) you get a distribution of how close to reality people are. The distribution is strongly peaked around ~10% (median is 12), meaning people in total get the "racial landscape" correct to within 10%. Not bad!
My colleagues will be pleased that the (very Poisson-like) distributions are the same for men and women. The women's median "correctness" is only larger by 1%. I haven't run a K-S test on these distributions or anything fancy like that, but I'd say this is within the sample errors.
Finally, I've taken the difference between peoples guess and the real population, and plotted as a function of the actual population. If people are spot-on, then their answers would lie on the thick grey zero-line. If they over-predict the contribution from a given race, then the answer lies above the line, and visa versa for under-predictions.
For Asians we see an interesting decreasing prediction with increasing population (with 1 very large outlier in the top right). This is a shape seen in the distributions for all the guesses about races, except white.
The prediction distribution of white people has a slightly different result. The same decreasing-type trend is seen, but significant over predictions dominate when there are actually low % of white people.
These last 3 figures could be interpreted to tell this story:
We seem to think there are lots of minorities in communities that are actually very white. This is shown as the over-prediction in the black and asian figures at low actual %'s. Very quickly the interpretation flips, and we seem to fairly consistently under-predict the contribution from minorities (and over-predict the % of whites) in more diverse communities.
I'm not convinced this is the whole story told by the data, but it's an interesting first look. There are many more variables at work here that I can study. For example, I haven't broken it down by any of the supplemental data taken on the respondents themselves yet! Examples include:
- Which race predicts the racial landscape more "correctly"?
- Which gender does?
- Which age group?
We can imagine lots of fun ways to cut the data with all these variables at work!
I have gotten lots of feedback from people noting shortcomings in the survey. I want to just reinforce/acknowledge a few excellent points/questions that have been raised:
- Yes, I am aware that polling my friends/colleagues/facebook/reddit will not give a robust or unbiased sample (however you choose to define such a thing).
- I still think the results have meaning, despite said selection effects, provided one does not over interpret the data.
- Hispanic is not a race, but an ethnicity.
- I agree that the lack of multi-racial or multi-ethnic options greatly limits the survey, but the bigger requirement was to get as big a sample as possible. Hard to do by myself, it turns out! I would love to get 1,000 people from the same city together and ask them a 4-page survey that includes many racial/ethnic options! (actually a possible follow-up to this project)
Incredibly, 40% of respondents have given me their email addresses. I solemnly promised to not spam you, and since this is not the final analysis of the survey, I have opted to not email them yet. If you think I should instead send this initial post around, let me know!
OK! That's all I've got at the moment. Please keep sharing/posting/tweeting/blogging the survey - I'd love to get the sample size to 4-digits!
Lastly: a HUGE thanks to all ~380 of you who've contributed so far!