Sunday, January 26, 2014

GeoGraphing with R; Part2: US State Heatmaps

The second geographical chart project I'll show is a classic.  In a national business it's often important to know the economic health of a region given different economic indicator values.  This logic uses a two color heatmap scheme for some intensity level visual feedback.This is another project I've developed at work and modified here to use public data.

The quantmod financial modeling library is the main source for this project.  I really like the design of this library.  The quantmod library features quick access to the most common sources of financial time series data (Google finance, Yahoo stocks, and FRED).  There are some great built in functions, a few of which I make use of below.  quantmod also has the added benefit of allowing you to trick coworkers into thinking you had a Bloomberg terminal installed overnight:


Great looking chart in three commands:
library(quantmod)
getSymbols('GOOG')
lineChart(Cl(GOOG['2011::']))

This project uses data from the St. Louis Federal Reserve's FRED repository.  I've written about my love for this public data before and using it with the quantmod library in R is even more convenient.

To create the heatmaps, I separated the project into two functions; the first creates a standardized data frame consisting of the time series data for each state, the second plots the state level data against a US state boundary map.


I've dubbed the first function "stFRED".  This function loops through each state using the built in state.abb data.  With each loop columns for date (from the quantmod xts index) and state name (from state.abb) are added to a data.frame creating a single standardized set structure.  After every state is added ldply is called to combine all sets.


#The stFRED function is built using the quantmod library to assign all US state level economic data to a single data frame
#I have chosen to for loop through each state in order to make use of the auto.assign=FALSE functionality
#which allows the printing each set instead of assigning it to separate sets
stFRED <- function(econ,begin="",ending=""){
  require(quantmod)
# The default state abbreviation set is used for the loop length and quantmod query   
  stDat <- lapply(state.abb, function(.state){
  input <- getSymbols(paste(.state,econ,sep=""),src="FRED", auto.assign=FALSE)
# Here I use the very effective subset function for the quantmod xts sets, using the variables begin and ending to subset  
  input <- input[paste(begin,'::',ending,sep="")]
# Converting the xts set to a data frame makes the data easier to manipulate for charting and other functions.
# This step assigns a date value to the index of the xts  
  input <- data.frame(econ_month=index(input),coredata(input))
# Since each state's indicator data includes a unique name ("VA...","GA...") I normalize them to one here
  colnames(input)[2]<-"ind_value"
# In order to separate the data later I include a variable for state name  
  input$St <- .state  
  input
  })   
# After returning each state dataset, I add them together using the very helpful ldply function
require(plyr)  
result <- ldply(stDat,data.frame)
  result
}  
 
Created by Pretty R at inside-R.org

The second function plots the state data onto a US map.  I borrowed much of the map plotting logic from Oscar Perpinan which I found from a StackOverflow question. This function could be used with other data, just note that I have used the names from the stFRED function for the plotted dataset.


#stFREDPlot creates a US state heatmap based on a state level data frame.
#While any state level set may be used, I have written this function to complement the stFRED function
#which produces a data frame which fits this function well.
stFREDPlot <- function(ds,nm=ptitle,ptitle=nm,begin=NULL, ending=NULL) {
# The libraries needed here are needed for the US state boundary mapping feature
  require(maps)
  require(maptools)
  require(mapproj)
  require(sp)
  require(plotrix)
  require(rgeos)
# To provide some default values for the begin and ending variables, I have set these variables to the minimum and maximum dates (full range)
# of the dataset.  This can be used for one or both, allowing partial subsets. 
  if ( is.null(begin) ) { begin<-min(ds$econ_month)}
  if ( is.null(ending) ) { ending<-max(ds$econ_month)}
  subds <- ds[ds$econ_month >= as.Date(begin) & ds$econ_month <= as.Date(ending),]
#The econSnap set is used for quick reference of the unique dates used  
  econSnap <- sort(unique(as.Date(subds$econ_month)))
#The dir.create function is used to create a folder to store the potentially many plot images created.
  if (is.null(nm) ) { print("Please enter a name or chart title") }
  dir <- paste("~//Plots//",nm,"//",sep="")
  dir.create(file.path(dir), showWarnings = FALSE)
#The variable i is used to reference the correct date in the econSnap set.  
  i <- 0
  for (n in econSnap) {
    plot.new()
    i <- i+1
#   Dataset limited to iterated reference date. 
    dataf <- data.frame(subds[subds$econ_month == n,])    
 
#   Much of this plotting logic built from tutorial found here: http://stackoverflow.com/questions/8537727/create-a-heatmap-of-usa-with-state-abbreviations-and-characteristic-frequency-in
#   Credit: StackOverflow user http://stackoverflow.com/users/964866/oscar-perpinan 
    dataf$states <- tolower(state.name[match(dataf$St,  state.abb)])
    mapUSA <- map('state',  fill = TRUE,  plot = FALSE)
    nms <- sapply(strsplit(mapUSA$names,  ':'),  function(x)x[1])
    USApolygons <- map2SpatialPolygons(mapUSA,  IDs = nms,  CRS('+proj=longlat'))
 
    idx <- match(unique(nms),  dataf$states)
    dat2 <- data.frame(value = dataf$ind_value[idx], match(unique(nms),  dataf$states))
    row.names(dat2) <- unique(nms)
 
    USAsp <- SpatialPolygonsDataFrame(USApolygons,  data = dat2)
    s = spplot(USAsp['value'],   col.regions = rainbow(100, start = 4/6, end = 1), main = paste(ptitle, ":  ", format(econSnap[i], format="%B %Y"),sep=""), colorkey=list(space='bottom'))
#   Status feedback given to user representing which date's US chart has been created.    
 print(format(econSnap[i], format="%B %Y"))
#   Plot saved as png.  Format chosen for malleability  in creating gif's and other manipulation
    png(filename=paste(dir,"//Map",substr(econSnap[i],1,7),".png",sep=""))
    print(s)
    dev.off() 
#   Dataset cleanup 
    rm(dataf)
    rm(dat2)
  }
}
Created by Pretty R at inside-R.org


As seen in the code, I have included some limited date subsetting funcationality and the resulting plots are saved for each available date.  This presents some possible problems if a very large data range is selected, but this iterative function will come in handy in part three of this series, animation.



No comments:

Post a Comment