Be sure to subscribe for updates on this and all my other data analysis projects!
A nice spreadsheet was posted a few days ago by The Chronicle, containing race, ethnicity, and gender data from ~4300 institutions of higher education across America. (Note: I believe the article and data are now behind a paywall, which was not in effect when I downloaded the data last week)
It's a really intriguing data set, and I thought it was worth a few minutes of my time to play with it. My results are amusing, but I don't think I've fully captured the rich potential this data has to offer serious researchers.
My first question was simple: what does the most basic racial composition of US Colleges look like?
I then remembered one of the first questions I ever asked when studying diversity/race in Universities: How does the racial composition of a school compare to the state as a whole?
To answer this, I needed to match the Chronicle table against the US 2010 Census data by State. Using every school I have plotted as an example the % of black students versus the % of black citizens in the entire state. Colors of each pixel represent the density of points. The orange-ish "1 to 1" line would represent perfect agreement: 43% of schools have more black students per capita than their state, and 57% of schools have fewer. I actually think this is remarkably good, considering how bad the data probably looked even 30 years ago. Still one might argue there is generally need for improvement. Further, the more illuminating graphic might be % minority graduating students versus % minority state citizens.
I then wanted to look at the spatial distribution of these schools, which is easiest to do (in my experience at least) by plotting the lat/long of ZIP codes (easiest way to "geocode" addresses across the nation). Pity, the Chronicle table doesn't have ZIP codes... So I cleaned up the Institution Name column it did provide and matched this against the US Dept of Education Accredited Postsecondary Institution database (a truly fascinating data set on its own). About 2/3 of the schools in the Chronicle list matched up easily (strictly) to the Accredited database. Rather than chasing down more string matching ghosts, I called that good enough. From this I could match accredited schools' ZIP codes to the Census!
Here is the US map of accredited institutions...
And here is the map broken down by schools with a higher % black students than their state (Blue) versus schools with a lower % (Red) -- Note: this is somewhat misleading, especially in New England, because I have plotted Red after Blue, which covers Blue points, making it seem like Red is dominating in some parts when it's actually quite comparable.
Update: Here are both maps as one graphic, hosted on Visual.ly
However, going back to the Red versus Blue maps, there are certainly other geographic trends with these schools. My "by-eye" analysis (read: shooting a bit from the hip here) is that Blue schools tend to be more clustered around big cities, and Red schools in more rural areas - especially in Texas and the midwest.
This has been a simple demonstration of just some of the fascinating stories this data has to tell. I certainly hope many more big surveys like this are published, and would be fascinated to see some with more details about the student body. The ultimate goal is of course to find where/when we are failing to serve people in higher education, and how we can improve!