Erich Donahue: April 2014

Tuesday, April 22, 2014

Pretty, Fast D3 charts using Datawrapper

While reading a news article I came across a US state cloropleth that piqued my curiosity.

An economic news article on state unemployment rates at The New Republic included a state level map with a two color scale and tooltips. As always when I see a new chart that I like I look for two things to steal; the data and the method used to make the chart.

In this case I was in luck. The embedded chart included links for both the method used and the data (great features of Datawrapper).

Datawrapper.de is a set of open source visual analytics tools (mostly Javascript from what I've seen) integrated into an easy to use UI. For those of us still learning D3.js this is a great way to build beautiful, interactive charts with the style and capabilities of the D3 style graphics which are used by many online publications and bloggers.

State level maps (Maps still in Beta at time of writing) are really easy and fun to create. To build you simply start by uploading or pasting in your data. I was able to simply paste in two columns of data, state abbreviation and value. Using R this might have taken me fifteen or twenty minutes or so, at Datawrapper this only took five

I took another stab at this data, trying out one of the line chart templates provided. Here I wanted to try to mimic (substance not style) one of my most favorite data tools, FRED:

Following similar steps to the state US map above I simply pasted in the time series data from FRED and moved through the Datawrapper wizard. I simply selected line chart then updated a few options to better mimic FRED (sadly the iconic recession shading is not available natively, or grid lines). In some ways the result is even more aesthetically pleasing, and could be a nice easy addition to a blog post or article

Friday, April 18, 2014

googleMap-ing with R; Communte Loop

There are two main routes I can take to get to work each day, the two make a comically circular commute when both are taken.

I thought I'd use that commute to try out yet another cool use case for R, GPS plotting with dynamically created HTML/JS using the googleVis package.

First we need some data points.

For simplicity I logged the GPS data using my phone with the GPS Logger for Android app. This app provides coordinates logged to .gpx, .kml, or comma delimited .txt files at a chosen interval. A few metrics are included like elevation, speed and bearing. I plotted my morning and evening commutes while driving two different routes to the office.

The code for this very simple, the gvisMap call to the Google Maps API does almost all of the heavy lifting. We simply pass gvisMap the latitude and longitude coordinates in the expected format (lat:lon) and then an optional label for each point plotted as a "Tip". Here I chose elevation relative to sea level, but any parameter in the set could be chosen.

#googleVis must be installed
library(googleVis)
 
#Read in the gps files.  I used .txt, but .gpx can be used also, with preparation
gps418 <- read.csv("20140418.txt")
gps418_2 <- read.csv("20140418_2.txt")
 
#Add morning and evening commutes
gps418<-merge(gps418,gps418_2, all.x=TRUE, all.y=TRUE)
 
#Prepare Latitude and Longitude for gvisMap
gps418$latlon <- paste(gps418$lat,":",gps418$lon, sep="")
 
#Call gvisMap function, here I'm using elevation as a label
gpsTry <- gvisMap(gps418,"latlon", "elevation",
                  options=list(showTip=TRUE, showLine=TRUE, enableScrollWheel=TRUE,
                               mapType='hybrid', useMapTypeControl=TRUE,
                               width=1600,height=800))
 
#Plot (Send gvisMap generated HTML to browser at local address)
plot(gpsTry)
 
#Yeah, it's that easy

Created by Pretty R at inside-R.org

The results are automatically rendered as a local address in your browser with the generated HTML calling the Google API. I've pasted the relevant portion here to display the map:

Hat tip to Geo-Kitchen, I noticed their post while doing this one. I like the plot of my alma mater

Friday, April 11, 2014

GeoGraphing with R; Part 4: Adding intensity to the county Red/Blue Map

Today I took the Red/Blue exercise from the last post a bit further. I'm a big advocate for the power of the subtle use of multiple indicators in a single chart and I thought I'd try this with the county level election graphs from the last post. In my experience the usability of chart peaks at three or so data points per visual. Any more than this and there is a real risk of boredom or confusion for the reader. The election maps in the last post had two layers (geography and winner), earlier I tried out expanding the second layer to include a measure of intensity.

Luckily R includes a package that makes this quite simple. The scales package includes the very useful alpha() function which transforms a color value using some scalar modifier. To achieve this in the scripts I used for the last post I simply had to create some scalar and use that to modify the existing color scheme.

This only adds two additional lines to the earlier scripts:

#Calculate winning percentage to use for shading
elect12$WinPct <- elect12$Win_Votes/elect12$TOTAL.VOTES.CAST
#Create transparent colors using scales package
elect12$alphaCol <- alpha(elect12$col,elect12$WinPct)
#Match colors to county.FIPS positions

Created by Pretty R at inside-R.org

I really think this helps add another dimension to the chart, answering an inevitable question that the reader might have.

There are a couple of issues with this methodology however, since the observed winning percentage values are so centered around certain values. In mid fifties for many counties, 2012 range was 46.2% (Eastford county Connecticut) to 95.9% (King county Texas). This causes a washout effect on the colors in the chart. A non-linear scaling using a log scale or binned color mapping could help with this.

Additionally, some measure of population size could be added to improve the readability of the chart. Election maps (as with most US level value maps) suffer from the cognitive dissonance of a seemingly uniform land distribution with a disparate population distribution. I was really influenced by Mark Newmans' fun take on election mapping. The cartographs he posted are especially interesting, I hope to create those in R sometime. I love the way that the population weighted cartograph allows the reader to intuit the average value from an otherwise misleading two color heatmap.

Thursday, April 10, 2014

GeoGraphing with R; Part 4: County Level Presidential Election Results

I've always loved US county level mapping because it provides enough detail to give an impression of complexity but retains a level of quick readability. I thought I'd try this out in R out a with a cliche but hopefully appealing set of charts. While I'm not particularly interested in political science I've always loved the graphics that define it. There is a kind of pop art beauty in the Red-Blue charts we're all used to seeing, and I thought I'd try to mimic those using R.

First, the data had to be located. As always, it's a little more difficult securing county level data than other metrics. The basic problem for election results county data is that while the data is well sourced at a state government level the county data is not easily found for all states in one place in an accessible way. I found a few great resources when searching for this data, and I ended up using two sources which seemed to be authorities on the subject for both the 2008 and 2012 presidential elections.

2008
For the 2008 election I used a file from Ducky/Webfoot, a blogger who cleaned up a contemporary fileset provided by USA Today. Since this set was already well cleaned there was little to do but read.csv() and code.

2012
Here I relied on a set provided by the Guardian which was referenced by some interesting blog posts on the subject. The Guardian provides the data in .xls or Google fusion table format. I chose to use the .xls file, which I cleaned somewhat and re-saved as a .csv.

I began by making sure the county FIPS codes lined up with those in the R maps package. It turned out that both sets were well populated with FIPS, but 2012 seemed to be missing some detail for Alaska (here I imputed a Romney win for the 10 or so states without data) and the 2008 set needed a transformation to create the five digit county FIPS code (state level multiplied by 1000 + county)
After reading in the .csv's I assigned a color value (#A12830 and #003A6F two of my favorite hex colors) to each FIPS based on the winning candidate (classic Red and Blue, no surprises here). This allows me to do a little trick later and quickly assign each county a color. I then assigned these colors and the Candidate names to lists to help create a legend later on:

elect12$col <- ifelse(elect12$Won=="O","#003A6F","#A12830") 
colorsElect = c("#003A6F","#A12830")
leg <- c("Obama", "Romney")

Created by Pretty R at inside-R.org

Next I created a list of colors matched and sorted on county from the county.fips data in the maps package:

elect12colors <- elect12$col [match(cnty.fips, elect12$FIPS.Code)]

Created by Pretty R at inside-R.org

After this we're ready to build the map. Here I used the png device because I wanted to make a really big zoomable image. The map() function here is pretty straightforward but I'll note that it is the "matched" color list that I'm using to assign the Red/Blue to the map in order to separate the color mapping outside of the map() function.

2008 Election R script

#R script for plotting the county level Presidential popular vote results of 2008#Read in pre-formatted csv with binary M/O values for "Won" 
elect08 <- read.csv("prez2008.csv") 
#Assign appropriate color to winning candidate#"#A12830" is a dark red, "#003A6F" a darker blue
elect08$col <- ifelse(elect08$Won=="M","#A12830","#003A6F") 
#Transform for FIPS from 2008 data
elect08$newfips <- (elect08$State.FIPS*1000)+elect08$FIPS
 
#Create lists for legend
colorsElect = c("#A12830","#003A6F")
leg <- c("McCain", "Obama") 
#Match colors to county.FIPS positions
elect08colors <- elect08$col [match(cnty.fips, elect08$newfips)] 
#Map values using standard map() function, output to png devicepng("elect08.png",width = 3000, height = 1920, units = "px") 
map("county", col = elect08colors, fill = TRUE, resolution = 0,
    lty = 0, projection = "polyconic")#Add white borders for readability
map("county", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2,
    projection="polyconic")title("2008 Presidential Election Results by County", cex.lab=5, cex.axis=5, cex.main=5, cex.sub=5)box()legend("bottomright", leg, horiz = FALSE, fill = colorsElect, cex = 4)dev.off()

Created by Pretty R at inside-R.org

2012 Election Map R Script

#R script for plotting the county level Presidential popular vote results of 2012#Read in pre-formatted csv with binary R/O values for "Won" 
elect12 <- read.csv("2012_Elect.csv") 
#Assign appropriate color to winning candidate#"#A12830" is a dark red, "#003A6F" a darker blue
elect12$col <- ifelse(elect12$Won=="O","#003A6F","#A12830") 
 
#Create lists for legend
colorsElect = c("#003A6F","#A12830")
leg <- c("Obama", "Romney") 
#Match colors to county.FIPS positions
elect12colors <- elect12$col [match(cnty.fips, elect12$FIPS.Code)] 
#Map values using standard map() function, output to png devicepng("elect12.png",width = 3000, height = 1920, units = "px") 
map("county", col = elect12colors, fill = TRUE, resolution = 0,
    lty = 0, projection = "polyconic")#Add white borders for readability
map("county", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2,
    projection="polyconic")title("2012 Presidential Election Results by County", cex.lab=5, cex.axis=5, cex.main=5, cex.sub=5)box()legend("bottomright", leg, horiz = FALSE, fill = colorsElect, cex = 4)dev.off()

Created by Pretty R at inside-R.org

Erich Donahue