I am a big fan of the RStudio IDE which provides some editing and data/file management services to the ultilitarian basic R GUI. I have also test similar code on a Raspberry Pi, which installs with a simple call to apt-get.
After seeing a presentation of some of the geographical presentation features of Tableau (GIS-lite within their visual analytics platform) I became inspired to experiment with mapping visuals, for free.
Using the wonderful wealth of user packages, I was able to get started on this quickly using some tutorials and documentation I found. I am especially in debt to Jeffrey Breen, the creator of the zipcode package and whose tutorial I found immensely helpful in creating this particular chart. This charting program is built around the plotting of latitude and longitude points against a contiguous United States map defined by state borders. Since the coordinates in each set is sympathetic, the matching between the borders and points is exact.
This particular chart is a version of a project I created for work, plotting the locations of bank branches for the top five banks by number of branches. In an era of thin branch banking, deep networks of brick and mortar branches aren't always considered key to retail banking success, but this type of analysis is still useful. This program is based on the publicly available branch location data from the FDIC downloaded as csv filess and parsed by R into data.frame objects. I have yet to find a public API for this data, bonus points to anyone who has.
The code below makes use of the zipcode package mentioned above as well as the ever useful ggplot2 graphing library. This is ready to run on any R platform with these packages installed.
#Install needed libraries (Note that zipcode is used for a dataset) library(zipcode) library(ggplot2) data(zipcode) #Read and format .csv's downloaded from the FDIC #Source http://research.fdic.gov/bankfind/ #csv's were renamed to the stock ticker of each bank but are otherwise unchanged #The raw csv's include 7 rows of metadata, this is removed allowing row 8 to be used as headers #Since Zip and Bank are all we care about, for now other headers are ignored #Bank name is added to allow aggregation by entity later #I've created a quick function for importing the data readBank <- function(filename) { bank <- read.csv(paste(filename,".csv",sep=""), header=TRUE,skip=7) bank$Bank <- filename bank } WFC <- readBank("WFC") JPM <- readBank("JPM") BAC <- readBank("BAC") USB <- readBank("USB") PNC <- readBank("PNC") #Concatenate bank files together top5 <- rbind(WFC,JPM, BAC, USB, PNC) #merge five bank set with zipcode to make mapping possible top5Zip <- merge(zipcode,top5, by.x= "zip",by.y ="Zip" ) #Much of the following has been taken from Jeffrey Breen at http://jeffreybreen.wordpress.com/2011/01/05/cran-zipcode/ #Begin mapping function. Colors denote bank names. "size" is increased to enhance the final plot g <- ggplot(data=top5Zip) + geom_point(aes(x=longitude, y=latitude, colour=Bank), size = 1.25) #Simplify display and limit to the "lower 48" #Some banks have Alaska branches (specifically Wells Fargo in this data), this is included, but ignored by the ggplot g <- g + theme_bw() + scale_x_continuous(limits = c(-125,-66), breaks = NULL) g <- g + scale_y_continuous(limits = c(25,50), breaks = NULL) #Don't need axis labels g <- g + labs(x=NULL, y=NULL) g <- g + borders("state", colour="black", alpha=0.5) g <- g + scale_color_brewer(palette = "Set1") #Arbitrary title g <- g + ggtitle("Top Five Banks by Number of Branches") + theme(plot.title = element_text(lineheight=.8, face="bold")) g <- g+ theme(legend.direction = "horizontal", legend.position = "bottom", legend.box = "vertical") g
Following the creation of this plot I usually use the ggplot2 ggsave feature to save the plot to an image file:
ggsave("branches5.png", plot=g)
The resulting plot:
As seen with the simplicity of the merge statement, you could substitute nearly any zipcode based data. Other charts I've created have included asset locations and temperature data.
As a preview, the next R GeoGraphing Post will focus on state level mapping data, and includes some animation tricks.
No comments:
Post a Comment