Yet another one of Hadley Wickhams packages that I’ve come to use to get me out of a bind from time-to-time has been rehsape2. There are a number of helpful functions that deal with munging data into formats that allow for easier display/plotting/analysis. The problem that I run into is that I’ll use one of the reshape functions in a project, then months later, I’ll forget the reshape2 syntax for a certain function and I’ll have to go looking to find my old code or a post that gets me back on the right track.
This is where I found myself this morning, stuck with a situation that I would usually use tapply to solve, and on the account of having >1 grouping factor, I had to get from A to Z using both summaryBy (doBy) followed by a reshape function (in this case it was dcast). After some searching around, I found this most helpful post on stack overflow that got me going in the right direction.
http://stackoverflow.com/questions/9621401/aggregation-requires-fun-aggregate-length-used-as-default
The key piece of code is found below:
library(reshape2) ## A toy dataset, with one row for each combination of variables d <- expand.grid(Hostname = letters[1:2], Date = Sys.Date() + 0:1, MetricType = LETTERS[3:4]) d$Value <- rnorm(seq_len(nrow(d))) ## A second dataset, in which one combination of variables is repeated d2 <- rbind(d, d[1,]) ## Runs without complaint dcast(d, Hostname + Date ~ MetricType) ## note-if you have greater than 1 value variable you will need to declare value.var="MYVAR" or else will default aggregate to length() ## Throws error asking for an aggregation function dcast(d2, Hostname + Date ~ MetricType) ## Happy again, with a supplied aggregation function dcast(d2, Hostname + Date ~ MetricType, fun.aggregate=mean)