Monday, January 27, 2014

GeoGraphing with R; Part 3: Animation

To finish out this series I'll show a method I recently used to create animated .GIFs from R plots.

I've found a few methods of doing this, almost all of which use ImageMagick or GraphicsMagick, callable tools to convert images files into gifs, ffmepgs, mpgs etc.

Most of the examples I've found involve calling the "convert" command (a PATH reference to ImageMagick or GraphicsMagick) within a function of the animation package.

A few of these make use of the saveGIF() function available from the animation library.  I like the intuitive nature of this function, but I found that I wasn't able to control the ImageMagick conversion as well as I'd like.  Using ImageMagick directly from the command line along with some settings tweaks gave me more nuanced control of the GIF creation.

I followed a process of creating the images first (PNGs created through a controlled loop, see Part 2) then calling ImageMagick's convert function directly from the command line using a command like:

C:\Users\Erich\Documents\Plots\State UNMP>convert -delay 100 *.png "UNMP2012+.gif"


At work I took a slightly different tack, and used a different external conversion program, PhotoScape.  THe PhotoScape GUI was easy to use, but not as hack-y as ImageMagick. 

 Finding the right delay is key; I've found 80ms works well for many charts

Favorite Tools: NDBC and the Chesapeake Bay Interpretive Buoy System

This one is not really a tool I use at work, just a favorite public data source of mine.  I really love the combination of physical computing and data.  The NOAA Buoy system might be considered one of the most widespread internet of things installations.

The NOAA Buoy System, consists of a network of buoys from different programs all tracked by NOAA.  Many of these are not under the direct supervision of NOAA, some are academic, others are state or local government installations.

The National Data Buoy Center website and database provides instant access to the status of many of these buoys.  Also included within the network are the observations from volunteer ships outfitted with sensors and telemetry equipment.  Individual buoys can be found via a map applet or within a mobile optimized site.


The Chesapeake Bay Interpretive Buoy System is part of the NDBC network and was designed to track the health of the bay using a network smartBuoys installed around the Chesapeake Bay watershed.  The smart buoys include a suite of sensorsDIY Arduino weather station might dream of.  The program supplements this environmental data with a parallel historic lesson, combining the bay health with the history of development in the watershed area, including the connection of the buoy locations with the historic journeys of a favorite historical figure of mine, Captain John Smith.

The CBIBS includes some cool data visualization features, like a graphing applet and csv downloads.  It also has a mobile app, which I've added to my wonkApp collection along with the FRED app.


While this data is used more urgently by mariners and scientists I love checking this data to consider the environment at some of my favorite places in the area; in the lower Potomac near where I grew up, Jamestown Island (visible from the fort site), and in the Upper Potomac (visible from my apartment).


I've even written some shell commands which I use to check on the Alexandria Buoy for real-time weather stats 200 yards from my apartment building while at work:

alias bTemp='wget -q http://www.ndbc.noaa.gov/mobile/station.php?station=44042 -O - | grep Air | cut -c1-15'

(Buoy Station changed in code to reference an active station.  Sadly, the Upper Potomac is offline for winter maintenance)

Sunday, January 26, 2014

GeoGraphing with R; Part2: US State Heatmaps

The second geographical chart project I'll show is a classic.  In a national business it's often important to know the economic health of a region given different economic indicator values.  This logic uses a two color heatmap scheme for some intensity level visual feedback.This is another project I've developed at work and modified here to use public data.

The quantmod financial modeling library is the main source for this project.  I really like the design of this library.  The quantmod library features quick access to the most common sources of financial time series data (Google finance, Yahoo stocks, and FRED).  There are some great built in functions, a few of which I make use of below.  quantmod also has the added benefit of allowing you to trick coworkers into thinking you had a Bloomberg terminal installed overnight:


Great looking chart in three commands:
library(quantmod)
getSymbols('GOOG')
lineChart(Cl(GOOG['2011::']))

This project uses data from the St. Louis Federal Reserve's FRED repository.  I've written about my love for this public data before and using it with the quantmod library in R is even more convenient.

To create the heatmaps, I separated the project into two functions; the first creates a standardized data frame consisting of the time series data for each state, the second plots the state level data against a US state boundary map.


I've dubbed the first function "stFRED".  This function loops through each state using the built in state.abb data.  With each loop columns for date (from the quantmod xts index) and state name (from state.abb) are added to a data.frame creating a single standardized set structure.  After every state is added ldply is called to combine all sets.


#The stFRED function is built using the quantmod library to assign all US state level economic data to a single data frame
#I have chosen to for loop through each state in order to make use of the auto.assign=FALSE functionality
#which allows the printing each set instead of assigning it to separate sets
stFRED <- function(econ,begin="",ending=""){
  require(quantmod)
# The default state abbreviation set is used for the loop length and quantmod query   
  stDat <- lapply(state.abb, function(.state){
  input <- getSymbols(paste(.state,econ,sep=""),src="FRED", auto.assign=FALSE)
# Here I use the very effective subset function for the quantmod xts sets, using the variables begin and ending to subset  
  input <- input[paste(begin,'::',ending,sep="")]
# Converting the xts set to a data frame makes the data easier to manipulate for charting and other functions.
# This step assigns a date value to the index of the xts  
  input <- data.frame(econ_month=index(input),coredata(input))
# Since each state's indicator data includes a unique name ("VA...","GA...") I normalize them to one here
  colnames(input)[2]<-"ind_value"
# In order to separate the data later I include a variable for state name  
  input$St <- .state  
  input
  })   
# After returning each state dataset, I add them together using the very helpful ldply function
require(plyr)  
result <- ldply(stDat,data.frame)
  result
}  
 
Created by Pretty R at inside-R.org

The second function plots the state data onto a US map.  I borrowed much of the map plotting logic from Oscar Perpinan which I found from a StackOverflow question. This function could be used with other data, just note that I have used the names from the stFRED function for the plotted dataset.


#stFREDPlot creates a US state heatmap based on a state level data frame.
#While any state level set may be used, I have written this function to complement the stFRED function
#which produces a data frame which fits this function well.
stFREDPlot <- function(ds,nm=ptitle,ptitle=nm,begin=NULL, ending=NULL) {
# The libraries needed here are needed for the US state boundary mapping feature
  require(maps)
  require(maptools)
  require(mapproj)
  require(sp)
  require(plotrix)
  require(rgeos)
# To provide some default values for the begin and ending variables, I have set these variables to the minimum and maximum dates (full range)
# of the dataset.  This can be used for one or both, allowing partial subsets. 
  if ( is.null(begin) ) { begin<-min(ds$econ_month)}
  if ( is.null(ending) ) { ending<-max(ds$econ_month)}
  subds <- ds[ds$econ_month >= as.Date(begin) & ds$econ_month <= as.Date(ending),]
#The econSnap set is used for quick reference of the unique dates used  
  econSnap <- sort(unique(as.Date(subds$econ_month)))
#The dir.create function is used to create a folder to store the potentially many plot images created.
  if (is.null(nm) ) { print("Please enter a name or chart title") }
  dir <- paste("~//Plots//",nm,"//",sep="")
  dir.create(file.path(dir), showWarnings = FALSE)
#The variable i is used to reference the correct date in the econSnap set.  
  i <- 0
  for (n in econSnap) {
    plot.new()
    i <- i+1
#   Dataset limited to iterated reference date. 
    dataf <- data.frame(subds[subds$econ_month == n,])    
 
#   Much of this plotting logic built from tutorial found here: http://stackoverflow.com/questions/8537727/create-a-heatmap-of-usa-with-state-abbreviations-and-characteristic-frequency-in
#   Credit: StackOverflow user http://stackoverflow.com/users/964866/oscar-perpinan 
    dataf$states <- tolower(state.name[match(dataf$St,  state.abb)])
    mapUSA <- map('state',  fill = TRUE,  plot = FALSE)
    nms <- sapply(strsplit(mapUSA$names,  ':'),  function(x)x[1])
    USApolygons <- map2SpatialPolygons(mapUSA,  IDs = nms,  CRS('+proj=longlat'))
 
    idx <- match(unique(nms),  dataf$states)
    dat2 <- data.frame(value = dataf$ind_value[idx], match(unique(nms),  dataf$states))
    row.names(dat2) <- unique(nms)
 
    USAsp <- SpatialPolygonsDataFrame(USApolygons,  data = dat2)
    s = spplot(USAsp['value'],   col.regions = rainbow(100, start = 4/6, end = 1), main = paste(ptitle, ":  ", format(econSnap[i], format="%B %Y"),sep=""), colorkey=list(space='bottom'))
#   Status feedback given to user representing which date's US chart has been created.    
 print(format(econSnap[i], format="%B %Y"))
#   Plot saved as png.  Format chosen for malleability  in creating gif's and other manipulation
    png(filename=paste(dir,"//Map",substr(econSnap[i],1,7),".png",sep=""))
    print(s)
    dev.off() 
#   Dataset cleanup 
    rm(dataf)
    rm(dat2)
  }
}
Created by Pretty R at inside-R.org


As seen in the code, I have included some limited date subsetting funcationality and the resulting plots are saved for each available date.  This presents some possible problems if a very large data range is selected, but this iterative function will come in handy in part three of this series, animation.



Wednesday, January 22, 2014

GeoGraphing with R; Part 1: Zipcode Mapping

I'd like to share some graphing work I've done with the R programming language.  I have been interested in R for a few years now, and have enjoyed the extremely intuitive platform it provides for data analysis.  Although I don't make much use of the powerful statistical tools R provides, I've found that this is the charm of R.  It provides a platform for any use you could need, with an intuitive interface like Python.  I keep R on my personal Ubuntu and Windows machines, use it at work, and have even installed R on my Raspberry Pis

I am a big fan of the RStudio IDE which provides some editing and data/file management services to the ultilitarian basic R GUI.  I have also test similar code on a Raspberry Pi, which installs with a simple call to apt-get.


After seeing a presentation of some of the geographical presentation features of Tableau (GIS-lite within their visual analytics platform) I became inspired to experiment with mapping visuals, for free.

Using the wonderful wealth of user packages, I was able to get started on this quickly using some tutorials and documentation I found.  I am especially in debt to Jeffrey Breen, the creator of the zipcode package and whose tutorial I found immensely helpful in creating this particular chart.  This charting program is built around the plotting of latitude and longitude points against a contiguous United States map defined by state borders.  Since the coordinates in each set is sympathetic, the matching between the borders and points is exact.


This particular chart is a version of a project I created for work, plotting the locations of bank branches for the top five banks by number of branches.  In an era of thin branch banking, deep networks of brick and mortar branches aren't always considered key to retail banking success, but this type of analysis is still useful.  This program is based on the publicly available branch location data from the FDIC downloaded as csv filess and parsed by R into data.frame objects.  I have yet to find a public API for this data, bonus points to anyone who has.

The code below makes use of the zipcode package mentioned above as well as the ever useful ggplot2 graphing library.  This is ready to run on any R platform with these packages installed.


#Install needed libraries (Note that zipcode is used for a dataset)
library(zipcode)
library(ggplot2)
data(zipcode)
 
#Read and format .csv's downloaded from the FDIC 
#Source http://research.fdic.gov/bankfind/
#csv's were renamed to the stock ticker of each bank but are otherwise unchanged
#The raw csv's include 7 rows of metadata, this is removed allowing row 8 to be used as headers
#Since Zip and Bank are all we care about, for now other headers are ignored
#Bank name is added to allow aggregation by entity later
#I've created a quick function for importing the data
readBank <- function(filename) {
  bank <- read.csv(paste(filename,".csv",sep=""), header=TRUE,skip=7)
  bank$Bank <- filename
  bank
}
WFC <- readBank("WFC")
JPM <- readBank("JPM")
BAC <- readBank("BAC")
USB <- readBank("USB")
PNC <- readBank("PNC")
 
#Concatenate bank files together
top5 <- rbind(WFC,JPM, BAC, USB, PNC)
#merge five bank set with zipcode to make mapping possible
top5Zip <- merge(zipcode,top5, by.x= "zip",by.y ="Zip" ) 
 
#Much of the following has been taken from Jeffrey Breen at http://jeffreybreen.wordpress.com/2011/01/05/cran-zipcode/
#Begin mapping function.  Colors denote bank names.  "size" is increased to enhance the final plot
g <- ggplot(data=top5Zip) + geom_point(aes(x=longitude, y=latitude, colour=Bank), size = 1.25)
 
#Simplify display and limit to the "lower 48"
#Some banks have Alaska branches (specifically Wells Fargo in this data), this is included, but ignored by the ggplot
g <- g + theme_bw() + scale_x_continuous(limits = c(-125,-66), breaks = NULL)
g <- g + scale_y_continuous(limits = c(25,50), breaks = NULL)
 
#Don't need axis labels
g <- g + labs(x=NULL, y=NULL)
g <- g + borders("state", colour="black", alpha=0.5)
g <- g + scale_color_brewer(palette = "Set1")
#Arbitrary title
g <- g + ggtitle("Top Five Banks by Number of Branches") + theme(plot.title = element_text(lineheight=.8, face="bold"))
g <- g+ theme(legend.direction = "horizontal", legend.position = "bottom", legend.box = "vertical")
g
Created by Pretty R at inside-R.org

Following the creation of this plot I usually use the ggplot2 ggsave feature to save the plot to an image file:
ggsave("branches5.png", plot=g)


The resulting plot:


As seen with the simplicity of the merge statement, you could substitute nearly any zipcode based data.  Other charts I've created have included asset locations and temperature data.


As a preview, the next R GeoGraphing Post will focus on state level mapping data, and includes some animation tricks.

Sunday, January 19, 2014

Favorite Tools: FRED and St. Louis Fed Research Tools

I'd like to use this series as a set of love notes on my favorite data tools.  Some of these I use almost constantly at work, others are personal favorites I have come across.


FRED is a tool I came across a few years ago while reading economics blogs.  The distinctive color of a standard FRED graph (with obligatory recession shading) was something I began to associate with the econ blogger crowd.  It seems this has been noticed by many, and Paul Krugman, his blog being one I first noticed FRED on, is quoted as saying "I think just about everyone doing short-order research — trying to make sense of economic issues in more or less real time — has become a FRED fanatic."

After using these tools at work and home I have come to feel the same way about the tool, even evangelizing its merits to my coworkers and friends.

FRED graphs are distinctive and immediately recognizable


In my work in data analysis at a national bank, I have come to greatly value FRED for two main reasons.  FRED is a singularly well organized and populated database and it allows the immediate reference to data which is often useful in a one off fashion.  Pulling this data out during a meeting has more than once garnered some recognition of my economic knowledge which might not have otherwise occurred.

The breadth of data available is somewhat astounding.  International Data might usually take you all over the web and to a few commercial sites, but FRED has enough to do most high level macroeconomic survey work.  I find the somewhat more obscure metrics very interesting at times, and it's fun to eyeball for trends.

It's too easy to make weird charts...


After discovering FRED's website I was ecstatic to find that an Excel Add-In had been developed.  i immediately made use of the feature and made sure I spread the news around.  Being able to quickly pull in common economic data while doing simple (or complex) analysis can save a lot of time.  Outsourcing the data storage and update costs to FRED is wonderful.  I've been able to cut down on some user table creation and maintenance I owned was a time saver.

In order to facilitate the access to my company's internal economic data hub I even created my own version of the FRED Excel Add-In, which I named ED.  Using some simple VBA  GUI elements (drop downs, radio buttons, many MsgBox's...) and an ODBC connection I was able to mimic the Excel Add-In functionality of FRED.  Adding in some charting code I was able to mimic the distinctive graphs as well.  Given that the data is proprietary, I don't see any issue in my imitation of FRED, and I view it as a labor of love in tribute to the data tool.
Tying FRED into R was an obvious result, and I've already begun to make use of this data.  Being able to pull this data down into the R environment makes it even easier to manipulate the data quickly, without the worry of Excel resources (Autosave I'm looking at you!), or adding the data to a database structure.  A R programming project I'll detail later exhibiting geographical plotting uses similar data, maybe I'll tie FRED in to show off the functionality.

I also happily own the FRED mobile app, which I find entirely too amusing, and has come in handy for wonky discussions, and to prove my data nerdiness to anyone in sight.

If they sold T-shirts, sign me up for two.


The St.. Louis Fed includes three other tools GeoFred (data Mapping), ALFRED (historical economic series), and CASSIDI (a personal favorite of mine, which details US banking industry data).  I believe I'll include love notes on these as well, CASSIDI especially.

Tuesday, January 14, 2014

Google Acquires Nest

In what seems to me like the perfect snapshot instagram of current technology trends, Google announced yesterday that it acquired the home automation pioneer, Nest.

Buying Nest, after forays into home energy data and hardware, seems like a great fit for Google.  Similar to the Android platform, Google is once again making use of hardware outsourcing (or corporate crowd sourcing) in order to focus on its core competency, smart data acquisition.

Nest's products are a great example of how known technology can be totally reinvented with the introduction of machine learning and UI enhancements.

The thermostat is no new product, and one of the most common technologies used in the sensing of temperature for thermostats has remained mostly unchanged over the past century since the first patent.  Simple circuits using thermistors can mimic the thermostat for less than $20.  Adding a fancy microcontroller only adds to the fun.  I've built a few temperature sensing projects which i hope to share in this space.

What makes the Nest thermostats and smoke detectors so exciting is the integration of the simple technology of household appliances and adding the relatively new and buzzy machine learning approach to existing technology.  A remote controlled thermostat is interesting, a bluetooth controlled thermostat might be fun, but an intelligent thermostat can actually change the way we interact with the technology.  Sensing and adapting to human behavior makes Nest's products both trendy and useful.



It has been my feeling for some time that consumer technology in this decade will be defined by the marriage of smart technology and big data.  Google's announcement only cements this path, and their place in the development of the machine learning era.

2013: a Christmas Tree

After a several years of fumbling with guitar electronics, playing with Arduinos, and now cookng with Raspberry Pis, my interest in the application of DIY electronics has infected the holiday rituals of my girlfriend and I.  Christmas 2013 was smart, in the trendy sense of the term.


Smart and shiny!

Raspberry Pi Powered Web Switch


To make use of my second Raspberry Pi (first Pi's application to be detailed later!) I chose to try out some simple smart relay techniques.  Using an example and inspiration from a great Make published RPi book (Great resource, I found every example useful and fun to try) I decided to use my Pi and Wifi to build a hands free Christmas light set up.  Following the timeless ideal of a creative solution to sometimes disproportionate problem I used my Pi to build a web server based remote for our tree's lights.

This project is a modified form of that found in Matt Richardson's RPi book mentioned above, also found at his website (specifically the WebLamp examples).  The script used for tree lights modified was gratefully modified from the example found at these sources


Materials (In order of coolness)
Raspberry Pi Model B     My first, totally worth the frantic refreshing and wait after pre-order
Power Switch Tail II      Such a great tool, makes me feel like an electrician, without trips to the ER
Adafruit T-Cobbler GPIO Breakout     I have both the standard and "T", T shape looks cool




 Hardware Connection

Pin 25 of RPi/Cobbler to +in of PowerSwitch (controls PSwitch relay)
Ground of RPi/Cobbler to -in of PowerSwitch



Software

I won't repeat all of the great work featured at Matt Richardson's site, except for the alterations I made.  The projecct is based around some simple python work using the python Flask extension, which can be used to support a simple webserver and more.  The python-based code is separated into multiple scripts; the main python code and a templates directory (the main HTML to be used in creating the Christmas Lights webpage).  The modular design makes it easy to modify the webpage for different applications.

By following all of the instructions at these resources you should arrive at a workable Flask-based web server accessible through your Pi's internal IP address or with http://raspberrypi.local for Apple products or Bonjour enabled devices. 

I've updated the HTML in the python files to be a little more festive, but this is purely cosmetic.

Additionally, I added a wrapper shell script to my /etc/init.d directory with sudo execution  on the Pi and updated the default boot list to include this shell program.
The wrapper shell script includes the following commands:
sudo nohup python /home/pi/WebLamp/weblamp.py &
Note that the home directory may be different based on your Linux distro and configuration.

With these steps and modifications I was able to create a cell phone (or any browser-capable device on the WiFi network, Flask is very forgiving) switch for our lights.  My girlfriend loved the functionality, and it added another personal touch to our decorations.


eOrnaments


In addition to the WiFi switch, I added a couple more electronics decorations to our tree.


Is there ever a bad time for a Ping)))?

Incorporating an earlier electronics project, this year we added an electronic advent countdown to the tree.


3 alligator clips clipping...

I made this device from Wicked Device's Day Counter kit.  I originally used the kit for a scheduling aid at the office, but liked the idea of an active decoration. The day counter is powered by a  micro-usb breakout (huge fan) and an old cell phone charger.


Almost sad to break down the project, it's definitely made us keep the tree up longer this year

Metadata

Part of the motivation for writing has been my slow realization of the impact my work in data has had on my outlook.  Instead of seeing streaming bits of green CRT nonsense floating in front of my face I have become awakened to the ubiquity of data (recorded and potential) indifferently existing in our world.

Among other straw men, I might say there are two extreme reactions to the idea of the digitization of reality.  One position may state that the mining of data from our existence has a corrupting effect on the real-time bio-analog experience.  "Why look for patterns in clouds when we can use machine learning to build then mine Clouds?"  Another viewpoint may be imagined as the enthusiastic defense of the plugged-in lifestyle, where the internet of things includes the tweeting toaster and the social media keggerator.

I have come to view the data revolution as a positive influence, and hope to learn what I can and perhaps play some part in my generation's version of dawn of the transistor era.  I understand the legitimate concerns of the enabling of a sometimes narcissistic or detached attitude, or the privacy/security risks of a data rich world.  Not in spite but because of these concerns I feel it is important to keep an optimistic view of data and electronic enhancements.  While market competition may provide some of the necessity impetus of invention, an over cautious pessimism can stifle the risks necessary to create.  Optimism and the tilting at green-energy windmills are among the most sustainable fuel sources for creative progress in data and electronics.


I hope to use this space to document my efforts in learning and building data driven projects and give back to the online community of digital artists and inventors I have come to love.  I may even build a tweeting trash can...