Friday, June 27, 2014

Article: Open CPU Database; Race Your Favorite Processors Head-to-Head

I thought I'd share “CPU DB: Recording Microprocessor History” written by members of the Stanford sponsored team that put together the database. I highly recommend this article, I thought it is a great example of the way that data, science, and data-science can work with the hardware and broader computer science industry.


The CPU DB (cpudb.stanford.edu) is an open database curated by a group at Stanford which stores information on processors throughout history. While the goals of the database addressed in the article are mostly concerned with using the database to analyze and predict processor performance the database is intended to be used by researchers with any subject in mind. I was especially impressed at how accessible the DB is, with multiple data interactions offered and example code for analysis. This article is both an introduction to the database by its proud parents as well as a demonstration of the analytical power the database allows.

Trendy Processors
The main research subject demonstrated by the authors is the use of the historic data to test for trends in the manufacture and performance of processors over time. The authors use several examples to show these features with graphical representations and statistical insights. Addressing the ubiquitous laws of Moore and Pollack the authors use data from the DB to show the performance gains over tiem and the impact of density and area afforded through manufacture. Despite my lack of deep knowledge in the area of hardware and architecture this aspect was most interesting to me. While many who worked in the hardware industry may know the history of decisions made to improve CPUs this DB gives a birds eye view of what the actual results have been. While we sometimes take for granted today that processors improve inevitably over time, the CPU DB gives us many parameters to empirically answer the question of how these improvements were made. Among many other examples the authors show that it’s true that clock frequency has increased dramatically from the 1980s to the mid 2000s they are able to show through empirical analysis that the introduction has all but halted progress on single core clock frequency in the past decade. This fact, while somewhat obvious from a macro trend is made more interesting with the detail the CPU DB provides. Citing the constant improvements in compiler optimization the authors show that while clock frequencies have been stagnant for some time, even single core performance has been increasing (albeit very slightly) in the last decade. This fact might be overlooked by a more narrowly focused industry report or consumer oriented technology journalism.


Open Analysis
The idea this article presents is fascinating to me, and I’m especially excited by the idea of the open data. Before I got to page two I was already thinking of how I might chop this data up with R. Much to my surprise when I went to the “Downloads” page the team has posted some sample R code. Just for fun I put together a quick chart of my own based on the CPU DB data, a wonky looking count of transistors per die over time.



R script:
require(ggplot2)
 
processor <- read.csv("processor.csv")
 
processor <- processor[!is.na(processor$date),]
 
processor <- processor[!is.na(processor$transistors),]
 
processor$date <- as.Date(processor$date)
 
processor <- processor[processor$date >= '2000-01-01',]
 
ggplot(data=processor, aes(x=date, y=transistors)) + geom_line(colour="blue", size=1) + ggtitle("Transistor Count 2000 to Today")
Created by Pretty R at inside-R.org

No comments:

Post a Comment