The Lego Wall

No comments:
This has nothing to do with data visualization, but I'm counting it as a hack project.

My wife and I have had an unused fireplace in our downstairs den. We were looking for a colorful and fun way to cover it so our pets wouldn't climb in. This is our solution. It is brilliant, which is how you can tell Sarah (wife) came up with it!

Check the rest of the album out here...

Lego Wall

Happy Holidays!

2016 Election Issues, by the Numbers

No comments:
You might have read or heard somewhere that in 2016 the US will once again hold a presidential election. For my international readers, a US election is a lot like a game of poker being played by circus animals on live TV: there may be a "winner", but typically we all lose.

There was much media chatter after the Democratic candidate debates (e.g. on NPR) saying how they focused more on "the issues" than the Republicans. This might be empirically true, given how much smaller a field the Democrats are currently competing in. It got me thinking: what are these "issues", and what does each candidate actually say about them?

I'm of the opinion that a good metric to judge a presidential candidate by is their website. They have total control of the content, layout, and aesthetic, and most are well funded (or, well enough to hire a startup web developer at least). You should be able to see what the candidate really thinks of their own image and platform, and their stance on "the issues". It's amazing how terrible some of these websites are...

Words Words Words

Still, websites offer a ready source of information on the distilled message each candidate has prepared on "the issues".  So to gather this data, I went to each of the candidate's websites, found whatever page (or usually pages) they had listed as "issues" or similar, and wholesale copied all the text. This totaled over 128k words, 74k from the Democrats and 54k form the Republicans. Here are two word clouds (generated via showing all of this text for the Democrats and Republicans:

These simple word frequency clouds show fascinating contrasts and similarities. The top words for the Dem's include candidate names, while for the GOP it's "President" and "tax". Over representation of social issues is clear for the Dems (e.g. "health", "care", "women") while the GOP has a strong emphasis on "Security". The word "government" is much more prominent for the GOP than the Dem's.

When I dig down in to the small/medium words, I'm struck by how similar they really are. The main bipartisan "issues" of the day appear to include "security", "jobs", and "health", but that probably doesn't surprise anyone.

Comparing Numbers

While Martin O'Malley may not have had a ton to say in the democratic debates, he certainly has a lot to say online. We get it, O'Malley, you probably were a NaNoWriMo winner three years running too.

On the other end of the spectrum (both in politics and verbiage) is Lindsay Graham , whose website squeaked out a dismal ~2200 words... about the length of two Op Ed pieces in the NYT or WSJ.

I don't know what the optimal number of words is, but I find it fascinating how few some of the GOP candidates have put up, and how much more the Democrats did. Speculating, the paucity of words from GOP candidates could be due to a believed lack of interest among potential voters in long-form content. Typically the GOP websites had fewer topic headings in their "Issues" pages (a notable exception is Rand Paul's page).

One of the best and worst types of analysis you can do with written language is study it's complexity. A very simple version of this, which I've featured repeatedly on this site in the past, is the Flesch-Kincaid Reading Level. This metric attempts to compute how complex language is by measuring the # Words per Sentence, and the # Syllables per Word.

There's good reason to think this is not a fair measure of intellectual complexity, but still there have been curious trends found with it in the past. You typically see a slew of articles about the Readability or Reading Level of State of the Union speeches each year. Also, the government requires some documents to pass certain Reading Level tests to promote more accessibility to the laws of the land.

With appropriate caveats in mind, here is the readability metric for each candidate's "Issues" page(s):

Trump's "Issues" page ranks lowest in terms of reading grade level. Bush isn't far behind him. Interestingly, they had the most number of total words among the GOP sources.

Rubio tops it off by a wide margin. Some say the ideal sentence should be 15-20 words. This randomly selected sentence has 42 words, and certainly contributes to the complexity of his text:
"The horrific mass shootings should prompt us to ask what causes people to commit these acts — like what can be done to improve the way we treat serious mental illness — rather than seize on the weapons they used." [source]
The Dem's, however, are all over the map compared to the GOP, but really are quite close in actual score. I suppose I'm not surprised O'Malley is the champ here too. Given there's only 3 candidates, and no strong outliers, it's not clear what we can say overall about the Dem's readability scores.

Pick your Favorite

For fun, here's the word cloud for each of the candidate's "Issues" pages. (pro tip: make a drinking game with these, maybe for the next debate?) I'll leave these without comment, except to say I'm sure you can interpret them any way you please.

View post on

There's a lot you can learn from a bag of words, and a lot of neat games you can play. I've put the data and snippet of code freely available on GitHub. A neat project would be, say, to create a probabilistic sentence generator based on text from each candidate... It would also be awesome to see what we might learn from archived webpages of successful candidates from elections past. Does text-heavy win? Or are people excited about the short-and-sweet message?

I can't say I'm excited about this coming election season, but at least it will give material for Saturday Night Live to work with... and that's something we can/should all appreciate.

Awesome visualization: Kepler Orrery IV

No comments:
Today my good friend, hopeless Cubs fan, and supremely talented astro PhD student, Ethan Kruse released a new version of the Kepler Orrery animation.

For those not familiar with the term, an "orrery" is a mechanical model of the solar system. These types of animations have become popular in the era of exoplanets to demonstrate the incredible variety of planetary systems we have discovered. Many look very strange compared to our own, but some like Kepler 452 look tantalizingly similar.

Check out Ethan's work below, or download the animated gif version here.

Edit: Like an open source champ, Ethan has even provided the source code on GitHub! Awesome!

How I Cite

Hello blog-o-sphere! It's been a long PhD thesis-induced hiatus here on If We Assume, but I'm working hard to catch up on my backlog of project ideas.

A couple months ago this question was posed on Twitter:

I offered to "quickly write something up", and in true academic fashion here it is 3 months later!

Gender study at the IAU meeting

No comments:

For any astronomers heading to Hawaii next week for the big IAU meeting, please consider helping me continue the study of gender demographics at astronomy conferences!

This conference brings together the largest and most diverse group of astronomers for 2 weeks of science and sunshine. It's also a perfect chance to continue my ongoing study of gender in speakers versus question askers.

The web form is here, and will be active for the entire 2 weeks of the conference!

See you in Hawaii!

PhD Thesis Done!

No comments:
Last week I defended my PhD thesis in astronomy! My brother-in-law was good enough to capture the event with my iPhone, so I'm sharing it here for anyone interested. The talk is full of astronomy data visualizations, some that I created last year for talks, and some new! Plus, the subject matter is NASA's epic Kepler mission, so I got that going for me...

Davenport PhD Thesis Talk from James Davenport on Vimeo.

The slides are a bit hard to see, so I'll post those online soon (and will update with a link)

Moving forward I'm excited to work on some of the ideas for If We Assume that have fallen to the backburner these past few months. On the professional side of things, starting in the fall I'll begin a 3 year NSF Postdoctoral Fellowship in the department of Physics and Astronomy at Western Washington University!

Finally, if you're not watching along as New Horizons passes by Pluto tomorrow (Tuesday, 2015 July 14), then you're missing a once-in-a-lifetime opportunity! Press conferences will be streamed online, starting at 0730 EDT. Images and videos will be going online constantly over the next few days. Follow the mission on Twitter: @NewHorizons2015, and my friend Alex Parker who works on this exciting mission: @Alex_Parker!
from APOD

Don't touch that dial...

No comments:
It's been radio silence here since March. In case anyone was wondering "what the hell happened to Davenport's blog?!" I wanted to post a note saying: thesis

My PhD thesis defense is scheduled for mid July, and currently I'm racing to get the final chapter written! While this blog has languished, my GitHub profile has been conspicuously green...

The above image is a play on the famous cover art from Joy Division's album "Unknown Pleasures", but uses actual data from my thesis!

I've got a whole mess of blog posts and data visualization ideas on the back-burner (like 40 draft posts/ideas started!), and hopefully in July there will be enough time to get them out. Until then, don't touch that dial...

In the meantime, check out this amazing data visualization by my friend and fellow UW graduate student Ethan Kruse!
View post on

PAC-MAN: The End of March Mapness

No comments:
Well we've come to the end of March. I was too busy the past few weeks to keep the high pace of map posts up, but I think we had a few gems. Next year we'll do even better!

For the final post this month, I'm tipping my hat to Google who have created the best map of the month. Starting today, you can play PAC-MANin your browser using Google Maps!

No seriously, try it!

When you go to Google Maps today (and tomorrow, I'd wager), You'll see a PAC-MAN utton. Click that anywhere and you're ready to play on your local streets

This feels a lot like the in-browser version of PAC-MAN that Google featured as a "Doodle" on May 21, 2010 in celebration of PAC-MAN's 30th anniversary. 

Happy April 1st.

When Map Projections Go Awry

1 comment:
One constant irk I have with online map tools such as the venerable, decade old Google Maps, is that the projection is fixed. Most online mapping applications use what's known as "Web Mercator" for their projection.

This works great for most all of my driving and walking around needs!

This doesn't work great for my exploring the world from my couch tasks...

As a simple example:

In this image Greenland is twice the size of the USA. In reality, the USA is about 5x larger than Greenland!

Hilarious side note: I put this post in the blog-queue 2 days ago, yet IFLScience posted almost the exact same article as this yesterday... whoa. Kinda neat!

Maps about Coffee Locations

No comments:
Besides maps, one of my favorite topics to study on this website is coffee!

Here's a simple map I made by overlaying locations of coffee shops onto a map of the UW campus, and drawing some simple circles. What it shows is the walking distance to coffee shops around campus, and that the entire UW campus is covered within a 2-minute walking time! [original post]

Next we extend this idea, and look at coffee shops around the country. Below is a map of the distribution of Starbucks stores in the US. The wire mesh is a triangulation that enables us to find the lowest Starbucks density point, which is around 140 miles from the nearest Pumpkin Spice Latte! [original post]

Finally, here's a really neat series of interactive maps by the MIT Media Lab, showing coverage maps of independent coffee shops in several major cities throughout the US. Below I'm showing a screenshot of the map for Portland, OR. [original post]

Mapping Road Trips

1 comment:
Randy Olsen, moderator of /r/dataisbeautiful, created a great map project about a week ago, which I thought I would the start week 3 of March Mapness off with!

Here is a map of the "Optimal Road Trip", which visits major landmarks in every one of the lower 48 states in the US, as solved by a genetic algorithm:

Randy has an awesome blog post about the project, and includes a live version of the map you can interact with! Check it out here, or follow Randy on Twitter for a near-constant stream of awesome data visualization!

Maps in D3.js

No comments:
Today I want to highlight maps using D3.js, the amazing javascript library for "data-driven documents" that powers a ton of the web-based visualization landscape these days!

This is a screenshot from a really neat animated map by Ben Dilday that draws the USA over time, with each state popping in to existence the year it joined the union. Go click on this link to check out the animated original source!
click for the animated source!

Of course, one can't talk about data visualization in D3 without mentioning the incomparable Mike Bostock, it's creator. His work is amazing (check out this awesome gallery). In particular here's a few animated or interactive maps he's created that stand out:
What I love about these javascript/D3 examples is that you can learn so much by just pulling them apart for your own purposes. And of course, there is a rich tutorial section in the official D3 github repo. Go learn! Go create!

More Metro Madness

No comments:
Here's another great visualization of metro activity, this time it's Shanghai, created by the talented Till Nagel. More details of this awesome project can be found on his website.

Happy March Mapness!

Shanghai Metro Flow from Till Nagel on Vimeo.

found via Reddit's /r/dataisbeautiful, a community of 2.5million data visualization lovers!

iPhone thickness over time

Yesterday Apple announced the new, super-duper thin MacBook. Thinness has been a metric of some obsession for the past 5 years for Apple (and many other companies), both in the notebook and mobile phone product lines. Perhaps it mirrors our society's anxiety about body image...

For me, a thin phone makes carrying the device constantly much more comfortable. (Massive screen size is not a big selling point for me) So I was curious, how has the iPhone shrunk since it's conception in 2007?

The figure above shows a nice steady trend towards thin... Using a linear extrapolation to this data, we can expect that sometime in 2023 Apple will announce an iPhone with a thickness of 0mm. That will be 16 years after the introduction of the device, so I project the product name will be approximately iPhone 17.

You heard it here first, folks!

Mapping the daily commute

No comments:
Here's a cool map Seth Kadish posted a couple days ago at Vizual Statistix. It shows the % of workers in counties who commute across state lines. [original post]

I'm especially interested in the area just north of Portland OR, which is Vancouver WA (not Vancouver BC). More than 30% of all the adult workers in Vancouver are heading over that I-5 bridge every day to work! Yikes!

The story is even more interesting on the east coast and in the south... check it out:

Thanks to Seth for sharing this, and again friends, be sure to subscribe to his great blog for many more cool maps and visualizations!!

Spinning Maps!

No comments:
Here's a fun recent map I made, using Python's Basemap toolkit. It was a good learning exercise for me, and I was happy to put the code online so others could learn from it. This data is simply the population density of the entire world, saved as a big (lat,lon) grid of values. [original post here]

I love spinning globes/maps for 2 reasons:

  1. They avoid most map projection nonsense about skewing or stretching data
  2. They remind me of the globe I had as a kid, which my brother and I would play with. I remember spinning it as hard as I could, and putting my finger down on random places in the world. I suspect this is a very common experience, and one of the most profound interactions you can have with a visualization....

World Population Density from James Davenport on Vimeo.

Here's another great spinning map, but this time not of Earth. This is the dwarf planet Ceres, and the map is made by stitching pictures together as it rotates. The awesome animation was made by my friend Dr Mike Solontoi! [original page here] Ceres has also been in the news lately because of mysterious bright dots that showed up in the middle of a crater (you can see them in the animation).

I love spinning maps of other worlds, because it makes them feel so much more like real places!

Finally, I can't talk about spinning maps of other worlds without my friend Dr Alex Parker's awesome maps of many extrasolar objects... he's got a whole blog post on spinning moons! [orignal post here] I've featured Europa here, because it looks so damn cool!

Maps of Destruction

Today for March Mapness I am featuring a few maps about destruction

First, a map I created recently charting the locations of all the known volcanoes that have erupted are in the world. Each location is colored by the number of known eruptions. The  incredible "ring of fire" is visible around the pacific rim, with most volcanoes having erupted many times in the past. [original post]

Next is a map from Seth at the always amazing Vizual-Statistix, showing the distribution tsunami triggers worldwide. [original post]. Interestingly, this map seems to resemble the one I made of volcanoes! There's probably a science for that...
Seriously, go follow Seth's work if you're not already!

Speaking of tsunamis, I have mention this incredibly powerful visualization from 2011 about the devastating earthquakes that rocked Japan. I've posted about this video before, as it is one of the most simple and compelling data visualizations I've ever seen.

Finally today, a map of the Black Death plague. There are countless excellent maps on this subject, and the history of disease and data visualization are closely linked. For example, the Broad Street Pump, or Florence Nightingale.
Bubonic plague map

Mapping buses like little ants

No comments:
Here is one of my favorite maps I've generated for this blog: 24 hours of the King County (Seattle) Metro busses zipping about...

Happy day 3 of March Mapness!

[original post]

24 Hours of King County Metro from James Davenport on Vimeo.

Maps of Twitter

No comments:
Maps of Twitter users are fairly common now - but they are still amazing. Here is (a screenshot from) one of my favorites by the good folks at Mapbox [link to interactive map]
This is especially neat, since only a few % of Tweets are geocoded. To be included in this map, you have to Tweet using a device w/ GPS (like your phone) and opt-in to sharing your location. Enough people do to make this fascinating map!

Here is another awesome version by reddit user Kombutini showing Twitter activity over a 24hr period. [Link to original post]

And finally, here's one I made in 2013 showing the average readability or reading ease of Tweets across ZIP codes [original blog post]. It's mostly noise, showing no real trend in readability geographically. What I did find was a significant trend with the % of college graduates per ZIP code, such that ZIP codes with more college grads were Tweeting more complicated language. I thought it was interesting enough to write a short paper on it.

That is just 3 interesting maps of Twitter data, from thousands that have been made and shared online already. What consistently lights up my imagination is how data and social scientists working with companies like Twitter (and Facebook and Google) are using social media or search services, combined with geographic data, to study humanity. We might detect disease outbreaks, natural disasters, possibly intervene when people are feeling suicidal, or judge the mood of society on real issues. It's an amazing time we live in.

March Mapness

No comments:
Today I am proud to announce the beginning of something wonderful and silly:

For the entire month of March I will be publishing blog posts about Maps! As an homage to the famous college basketball tournament, I am calling it "March Mapness". This is an idea of staggering brilliance, and I want it to catch on for the whole data visualization blog-o-sphere. You heard it here first, folks: March is officially the new month of maps in the dataviz world!

I can't promise I'll actually post a map every day, but I will feature some classic map-based posts from years past on If We Assume, cross-post some great content from friends on other blogs, and maybe even generate some new awesome visualizations!

Stay tuned!

The Words of Spock

1 comment:
This morning (2015 Feb 27) it was announced that Leonard Nimoy had passed away. 
Here, to celebrate his life and his work, is a word cloud I made of all the dialogue he had in Star Trek (the Original Series)

(a couple alternate layouts: 1, 2)

Live long and prosper

Because it seemed logical, I re-made this word cloud with a familiar shape...

World Population Density Animation

No comments:
I have been working on learning to use the mapping package Basemap in Python. This is early learning of machinery for some more posts on maps, and normalizing things by underlying population density.

World Population Density from James Davenport on Vimeo.

Here is a spinning globe with population density drawn. This was code written in 20 min, and took about 100 min to render the images on my laptop. Basemap is straight forward to use... once you know what you're doing. However, the maps don't seem quite as mature as those in IDL (to be fair, IDL has been a commercial mapping product for many years). If you'd like to use this as an example for learning Basemap/Python, the code is on GitHub! I am also working to make the same map using IDL as a teaching tool.

hat tip: got code help from here!

Football Statistics: the Impact of Smiling

No comments:
If you've ever watched a professional football game (and this is probably true for most professional sports) then you have seen these little portraits of players that appear at the bottom of the screen. On some TV networks they are actually short video clips where the players announce their alma mater, on some networks they are animations where the players each raise their heads and occasionally blink (these creep me out), and for other networks these head-shots are just still photos. Some players smile in their photos, some do not.

Key & Peele have a recurring bit about this player introduction phenomenon.

While watching a Seahawks game this past year, my mother in law posed an amusing question: Do players who smile in their photos play better football?

The question is simple and whimsical, in other words perfect. I don't know anything about how often these photos are taken, what the player's mindset is when they're shot, or if there is any prior expectation about attitude/persona and player record. I set out to find some answers...

For this study I am only focusing on Quarter Backs (QBs) in American professional footbal (NFL), though it would be easy to extend to all positions if anyone can help me get the data! Right away I know I'll need a few ingredients: photos of each player, some classification of their smiles, and some real stats on their records in the NFL.

How Long Do Senators Actually Serve?

1 comment:
Today is the 2015 State of the Union address by President Obama. His speech has already been widely written about, and as with most SOTU addresses presenting it is largely a formality. There are many traditions associated with the SOTU, including the POTUS having to be invited to give the talk.

Another SOTU tradition, which I find very American (and started in the 1960's), is the SOTU response given by a member of the opposite political party to the POTUS. This year Joni Ernst, a brand new Senator from Iowa whose campaign was dominated by hog castration, will be the 16th woman to deliver a SOTU response.

Senator Ernst may be a public figure for quite some time, so I wondered today:
how long do Senators typically serve? 
There are some very well known examples of career Senators, for example the lengendary badass, Daniel K. Inouye. But what of the typical Senator? Here are some numbers...

After nearly 230 years, the US Senate has more than 1855 distinct former members. The term length for Senators is 6 years (compared with the typical length of 2 years for representatives).
Here is the histogram of term durations for all past Senators. Note: some people served multiple, non-consecutive terms, or changed parties, which counts in this data as a separate term. You can clearly see the spikes of 1, 2, ... 6 terms. The average term is 7.6 years (median 6 years) for all past Senators.

However, the typical duration a Senator serves appears to be changing. Above I have plotted the duration versus the year elected for all past Senators. You can see in the running 2-year median the typical serving duration has increased, especially since WW2. While the overall average for length served is 7.6 years, since 1950 it has risen to 11.8 years. This means on average Senators are elected twice, and will serve during at least 2 presidencies.

This is longer than the average rein of a Pope (7.3 years), but shorter than a US Supreme Court Justice (16.4 years).

GitHub repo with data is here

Making Quality Animations in IDL

No comments:
Here is a brief post that outlines how I easily create high quality animations using IDL. I realize IDL is not the most popular language in the astrophysics community these days, but this processes works for any language that can render plots to a file. While there are other ways to do this in Python, I think this processes works well since it allows you to experiment with each stage independently. I'll also embed a few animations I have created, hopefully to give you an idea of what is possible.

The workflow is relatively simple:
render many still frames > convert to images > encode them as a video

Note, this is the same workflow to create animated gifs, just swap the last step.

Note also, this guide assumes you are using OS X or LINUX.

12 days of data from Kepler 17

No comments:
Here is a preliminary result from my Ph.D. research that I showed in my AAS 225 talk earlier in January. This is from the final of four journal publications that will make up my thesis. My adviser Prof. Leslie Hebb and I are working (hard!) to model the evolution of starspots on the Sun-like star, Kepler 17. I am working hard to get PhD paper #3 out to publication, and to finalize these results for publication before I graduate (shameless plug for my CV). I'm sharing this video here because I think it's an elegant visualization of the method we are using, while also demonstrating the challenges and possibilities!

This video shows a single 12-day portion of data from Kepler 17. The data (in black) come from NASA's Kepler mission. The model (in red) is our best estimate of the star+planet conditions during one full rotation of the star, or about 12 days. We then repeat this analysis every 12 days over a span of 4 years (Earth years, that is!) to learn about the evolution of the starspot features we see.

When the planet crosses in front of it's host star, we sometimes observe a "bump" in the transit. This is caused by the planet crossing or occulting a dark starspot region on the stellar surface (akin to a Sunspot). The animation shows the rotating star with spots (top), the full 12-days of data (middle), and zoom-ins during the planetary transit (bottom).

The planet (Kepler 17b) has a very short "year" indeed, going around it's host star every 1.5 days. This means during the 12 days it takes Kepler 17 to rotate, the planet eclipses 8 times! With this remarkable geometry, we are able to deduce and map the locations and sizes of at least 8 starspots. Previous techniques using imaging data without a transiting exoplanet could at most infer the approximate properties for 2 or 3 starspots. With this method we believe we can robustly detect up to 10x more starspots, and even trace their evolution. While this model has 8 spots included, there is strong evidence in the 4 year dataset that times fewer and sometimes even more spots are observed! This information will unlock details about magnetic fields and the inner workings of stars beyond our Sun. For reference, only a small handful of stars currently have detailed information about their magnetic fields or starspots. Our characterization of Kepler 17 will likely be the most detailed ever analysis of spots on any star besides the Sun.

My Thesis Talk at AAS 225

No comments:
About a week ago I gave a 15 minute dissertation talk at the American Astronomical Society's 225th meeting in Seattle, WA. I was awarded the Rodger Doxsey Travel Prize to give this presentation.

Fortunately my good friend and fellow UW alum, Dr. Adam Kowalski, recorded the talk with my iPhone! This gives me a welcomed chance to critique my public speaking (there is a lot to criticize), and an opportunity to share an astronomy research-based talk on this site!

Of course, there are several nifty visualizations for your viewing pleasure...

Related Posts Plugin for WordPress, Blogger...