I'd like to share some graphing work I've done with the
R programming language. I have been interested in R for a few years now, and have enjoyed the extremely intuitive platform it provides for data analysis. Although I don't make much use of the powerful statistical tools R provides, I've found that this is the charm of R. It provides a platform for any use you could need, with an intuitive interface like Python. I keep R on my personal Ubuntu and Windows machines, use it at work, and have even installed R on my Raspberry Pis
I am a big fan of the
RStudio IDE which provides some editing and data/file management services to the ultilitarian basic
R GUI. I have also test similar code on a Raspberry Pi, which installs with a simple call to apt-get.
After seeing a presentation of some of the geographical presentation features of
Tableau (GIS-lite within their visual analytics platform) I became inspired to experiment with mapping visuals, for free.
Using the wonderful wealth of user packages, I was able to get started on this quickly using some tutorials and documentation I found. I am especially in debt to Jeffrey Breen, the creator of the
zipcode package and whose
tutorial I found immensely helpful in creating this particular chart. This charting program is built around the plotting of latitude and longitude points against a contiguous United States map defined by state borders. Since the coordinates in each set is sympathetic, the matching between the borders and points is exact.
This particular chart is a version of a project I created for work, plotting the locations of bank branches for the top five banks by number of branches. In an era of
thin branch banking, deep networks of brick and mortar branches aren't always considered key to
retail banking success, but this type of analysis is still useful. This program is based on the publicly available branch location data from the
FDIC downloaded as csv filess and parsed by R into data.frame objects. I have yet to find a public API for this data, bonus points to anyone who has.
The code below makes use of the
zipcode package mentioned above as well as the ever useful
ggplot2 graphing library. This is ready to run on any R platform with these packages installed.
#Install needed libraries (Note that zipcode is used for a dataset)
library(zipcode)
library(ggplot2)
data(zipcode)
#Read and format .csv's downloaded from the FDIC
#Source http://research.fdic.gov/bankfind/
#csv's were renamed to the stock ticker of each bank but are otherwise unchanged
#The raw csv's include 7 rows of metadata, this is removed allowing row 8 to be used as headers
#Since Zip and Bank are all we care about, for now other headers are ignored
#Bank name is added to allow aggregation by entity later
#I've created a quick function for importing the data
readBank <- function(filename) {
bank <- read.csv(paste(filename,".csv",sep=""), header=TRUE,skip=7)
bank$Bank <- filename
bank
}
WFC <- readBank("WFC")
JPM <- readBank("JPM")
BAC <- readBank("BAC")
USB <- readBank("USB")
PNC <- readBank("PNC")
#Concatenate bank files together
top5 <- rbind(WFC,JPM, BAC, USB, PNC)
#merge five bank set with zipcode to make mapping possible
top5Zip <- merge(zipcode,top5, by.x= "zip",by.y ="Zip" )
#Much of the following has been taken from Jeffrey Breen at http://jeffreybreen.wordpress.com/2011/01/05/cran-zipcode/
#Begin mapping function. Colors denote bank names. "size" is increased to enhance the final plot
g <- ggplot(data=top5Zip) + geom_point(aes(x=longitude, y=latitude, colour=Bank), size = 1.25)
#Simplify display and limit to the "lower 48"
#Some banks have Alaska branches (specifically Wells Fargo in this data), this is included, but ignored by the ggplot
g <- g + theme_bw() + scale_x_continuous(limits = c(-125,-66), breaks = NULL)
g <- g + scale_y_continuous(limits = c(25,50), breaks = NULL)
#Don't need axis labels
g <- g + labs(x=NULL, y=NULL)
g <- g + borders("state", colour="black", alpha=0.5)
g <- g + scale_color_brewer(palette = "Set1")
#Arbitrary title
g <- g + ggtitle("Top Five Banks by Number of Branches") + theme(plot.title = element_text(lineheight=.8, face="bold"))
g <- g+ theme(legend.direction = "horizontal", legend.position = "bottom", legend.box = "vertical")
g
Created by Pretty R at inside-R.org
Following the creation of this plot I usually use the ggplot2 ggsave feature to save the plot to an image file:
ggsave("branches5.png", plot=g)
The resulting plot:
As seen with the simplicity of the merge statement, you could substitute nearly any zipcode based data. Other charts I've created have included asset locations and temperature data.
As a preview, the next R GeoGraphing Post will focus on state level mapping data, and includes some animation tricks.