Erich Donahue: Excel

Showing posts with label Excel. Show all posts

Friday, April 11, 2014

GeoGraphing with R; Part 4: Adding intensity to the county Red/Blue Map

Today I took the Red/Blue exercise from the last post a bit further. I'm a big advocate for the power of the subtle use of multiple indicators in a single chart and I thought I'd try this with the county level election graphs from the last post. In my experience the usability of chart peaks at three or so data points per visual. Any more than this and there is a real risk of boredom or confusion for the reader. The election maps in the last post had two layers (geography and winner), earlier I tried out expanding the second layer to include a measure of intensity.

Luckily R includes a package that makes this quite simple. The scales package includes the very useful alpha() function which transforms a color value using some scalar modifier. To achieve this in the scripts I used for the last post I simply had to create some scalar and use that to modify the existing color scheme.

This only adds two additional lines to the earlier scripts:

#Calculate winning percentage to use for shading
elect12$WinPct <- elect12$Win_Votes/elect12$TOTAL.VOTES.CAST
#Create transparent colors using scales package
elect12$alphaCol <- alpha(elect12$col,elect12$WinPct)
#Match colors to county.FIPS positions

Created by Pretty R at inside-R.org

I really think this helps add another dimension to the chart, answering an inevitable question that the reader might have.

There are a couple of issues with this methodology however, since the observed winning percentage values are so centered around certain values. In mid fifties for many counties, 2012 range was 46.2% (Eastford county Connecticut) to 95.9% (King county Texas). This causes a washout effect on the colors in the chart. A non-linear scaling using a log scale or binned color mapping could help with this.

Additionally, some measure of population size could be added to improve the readability of the chart. Election maps (as with most US level value maps) suffer from the cognitive dissonance of a seemingly uniform land distribution with a disparate population distribution. I was really influenced by Mark Newmans' fun take on election mapping. The cartographs he posted are especially interesting, I hope to create those in R sometime. I love the way that the population weighted cartograph allows the reader to intuit the average value from an otherwise misleading two color heatmap.

Thursday, April 10, 2014

GeoGraphing with R; Part 4: County Level Presidential Election Results

I've always loved US county level mapping because it provides enough detail to give an impression of complexity but retains a level of quick readability. I thought I'd try this out in R out a with a cliche but hopefully appealing set of charts. While I'm not particularly interested in political science I've always loved the graphics that define it. There is a kind of pop art beauty in the Red-Blue charts we're all used to seeing, and I thought I'd try to mimic those using R.

First, the data had to be located. As always, it's a little more difficult securing county level data than other metrics. The basic problem for election results county data is that while the data is well sourced at a state government level the county data is not easily found for all states in one place in an accessible way. I found a few great resources when searching for this data, and I ended up using two sources which seemed to be authorities on the subject for both the 2008 and 2012 presidential elections.

2008
For the 2008 election I used a file from Ducky/Webfoot, a blogger who cleaned up a contemporary fileset provided by USA Today. Since this set was already well cleaned there was little to do but read.csv() and code.

2012
Here I relied on a set provided by the Guardian which was referenced by some interesting blog posts on the subject. The Guardian provides the data in .xls or Google fusion table format. I chose to use the .xls file, which I cleaned somewhat and re-saved as a .csv.

I began by making sure the county FIPS codes lined up with those in the R maps package. It turned out that both sets were well populated with FIPS, but 2012 seemed to be missing some detail for Alaska (here I imputed a Romney win for the 10 or so states without data) and the 2008 set needed a transformation to create the five digit county FIPS code (state level multiplied by 1000 + county)
After reading in the .csv's I assigned a color value (#A12830 and #003A6F two of my favorite hex colors) to each FIPS based on the winning candidate (classic Red and Blue, no surprises here). This allows me to do a little trick later and quickly assign each county a color. I then assigned these colors and the Candidate names to lists to help create a legend later on:

elect12$col <- ifelse(elect12$Won=="O","#003A6F","#A12830") 
colorsElect = c("#003A6F","#A12830")
leg <- c("Obama", "Romney")

Created by Pretty R at inside-R.org

Next I created a list of colors matched and sorted on county from the county.fips data in the maps package:

elect12colors <- elect12$col [match(cnty.fips, elect12$FIPS.Code)]

Created by Pretty R at inside-R.org

After this we're ready to build the map. Here I used the png device because I wanted to make a really big zoomable image. The map() function here is pretty straightforward but I'll note that it is the "matched" color list that I'm using to assign the Red/Blue to the map in order to separate the color mapping outside of the map() function.

2008 Election R script

#R script for plotting the county level Presidential popular vote results of 2008#Read in pre-formatted csv with binary M/O values for "Won" 
elect08 <- read.csv("prez2008.csv") 
#Assign appropriate color to winning candidate#"#A12830" is a dark red, "#003A6F" a darker blue
elect08$col <- ifelse(elect08$Won=="M","#A12830","#003A6F") 
#Transform for FIPS from 2008 data
elect08$newfips <- (elect08$State.FIPS*1000)+elect08$FIPS
 
#Create lists for legend
colorsElect = c("#A12830","#003A6F")
leg <- c("McCain", "Obama") 
#Match colors to county.FIPS positions
elect08colors <- elect08$col [match(cnty.fips, elect08$newfips)] 
#Map values using standard map() function, output to png devicepng("elect08.png",width = 3000, height = 1920, units = "px") 
map("county", col = elect08colors, fill = TRUE, resolution = 0,
    lty = 0, projection = "polyconic")#Add white borders for readability
map("county", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2,
    projection="polyconic")title("2008 Presidential Election Results by County", cex.lab=5, cex.axis=5, cex.main=5, cex.sub=5)box()legend("bottomright", leg, horiz = FALSE, fill = colorsElect, cex = 4)dev.off()

Created by Pretty R at inside-R.org

2012 Election Map R Script

#R script for plotting the county level Presidential popular vote results of 2012#Read in pre-formatted csv with binary R/O values for "Won" 
elect12 <- read.csv("2012_Elect.csv") 
#Assign appropriate color to winning candidate#"#A12830" is a dark red, "#003A6F" a darker blue
elect12$col <- ifelse(elect12$Won=="O","#003A6F","#A12830") 
 
#Create lists for legend
colorsElect = c("#003A6F","#A12830")
leg <- c("Obama", "Romney") 
#Match colors to county.FIPS positions
elect12colors <- elect12$col [match(cnty.fips, elect12$FIPS.Code)] 
#Map values using standard map() function, output to png devicepng("elect12.png",width = 3000, height = 1920, units = "px") 
map("county", col = elect12colors, fill = TRUE, resolution = 0,
    lty = 0, projection = "polyconic")#Add white borders for readability
map("county", col = "white", fill = FALSE, add = TRUE, lty = 1, lwd = 0.2,
    projection="polyconic")title("2012 Presidential Election Results by County", cex.lab=5, cex.axis=5, cex.main=5, cex.sub=5)box()legend("bottomright", leg, horiz = FALSE, fill = colorsElect, cex = 4)dev.off()

Created by Pretty R at inside-R.org

Sunday, January 19, 2014

Favorite Tools: FRED and St. Louis Fed Research Tools

I'd like to use this series as a set of love notes on my favorite data tools. Some of these I use almost constantly at work, others are personal favorites I have come across.

FRED is a tool I came across a few years ago while reading economics blogs. The distinctive color of a standard FRED graph (with obligatory recession shading) was something I began to associate with the econ blogger crowd. It seems this has been noticed by many, and Paul Krugman, his blog being one I first noticed FRED on, is quoted as saying "I think just about everyone doing short-order research — trying to make sense of economic issues in more or less real time — has become a FRED fanatic."

After using these tools at work and home I have come to feel the same way about the tool, even evangelizing its merits to my coworkers and friends.

FRED graphs are distinctive and immediately recognizable

In my work in data analysis at a national bank, I have come to greatly value FRED for two main reasons. FRED is a singularly well organized and populated database and it allows the immediate reference to data which is often useful in a one off fashion. Pulling this data out during a meeting has more than once garnered some recognition of my economic knowledge which might not have otherwise occurred.

The breadth of data available is somewhat astounding. International Data might usually take you all over the web and to a few commercial sites, but FRED has enough to do most high level macroeconomic survey work. I find the somewhat more obscure metrics very interesting at times, and it's fun to eyeball for trends.

It's too easy to make weird charts...

After discovering FRED's website I was ecstatic to find that an Excel Add-In had been developed. i immediately made use of the feature and made sure I spread the news around. Being able to quickly pull in common economic data while doing simple (or complex) analysis can save a lot of time. Outsourcing the data storage and update costs to FRED is wonderful. I've been able to cut down on some user table creation and maintenance I owned was a time saver.

In order to facilitate the access to my company's internal economic data hub I even created my own version of the FRED Excel Add-In, which I named ED. Using some simple VBA GUI elements (drop downs, radio buttons, many MsgBox's...) and an ODBC connection I was able to mimic the Excel Add-In functionality of FRED. Adding in some charting code I was able to mimic the distinctive graphs as well. Given that the data is proprietary, I don't see any issue in my imitation of FRED, and I view it as a labor of love in tribute to the data tool.

Tying FRED into R was an obvious result, and I've already begun to make use of this data. Being able to pull this data down into the R environment makes it even easier to manipulate the data quickly, without the worry of Excel resources (Autosave I'm looking at you!), or adding the data to a database structure. A R programming project I'll detail later exhibiting geographical plotting uses similar data, maybe I'll tie FRED in to show off the functionality.

I also happily own the FRED mobile app, which I find entirely too amusing, and has come in handy for wonky discussions, and to prove my data nerdiness to anyone in sight.

If they sold T-shirts, sign me up for two.

The St.. Louis Fed includes three other tools GeoFred (data Mapping), ALFRED (historical economic series), and CASSIDI (a personal favorite of mine, which details US banking industry data). I believe I'll include love notes on these as well, CASSIDI especially.

Erich Donahue

Friday, April 11, 2014

GeoGraphing with R; Part 4: Adding intensity to the county Red/Blue Map

Thursday, April 10, 2014

GeoGraphing with R; Part 4: County Level Presidential Election Results

Sunday, January 19, 2014

Favorite Tools: FRED and St. Louis Fed Research Tools

Followers

My Blogroll

Blog Archive