Yet another one of Hadley Wickhams packages that I’ve come to use to get me out of a bind from time-to-time has been rehsape2. There are a number of helpful functions that deal with munging data into formats that allow for easier display/plotting/analysis. The problem that I run into is that I’ll use one of the reshape functions in a project, then months later, I’ll forget the reshape2 syntax for a certain function and I’ll have to go looking to find my old code or a post that gets me back on the right track.
This is where I found myself this morning, stuck with a situation that I would usually use tapply to solve, and on the account of having >1 grouping factor, I had to get from A to Z using both summaryBy (doBy) followed by a reshape function (in this case it was dcast). After some searching around, I found this most helpful post on stack overflow that got me going in the right direction.
http://stackoverflow.com/questions/9621401/aggregation-requires-fun-aggregate-length-used-as-default
The key piece of code is found below:
library(reshape2)
## A toy dataset, with one row for each combination of variables
d <- expand.grid(Hostname = letters[1:2],
Date = Sys.Date() + 0:1,
MetricType = LETTERS[3:4])
d$Value <- rnorm(seq_len(nrow(d)))
## A second dataset, in which one combination of variables is repeated
d2 <- rbind(d, d[1,])
## Runs without complaint
dcast(d, Hostname + Date ~ MetricType)
## note-if you have greater than 1 value variable you will need to declare value.var="MYVAR" or else will default aggregate to length()
## Throws error asking for an aggregation function
dcast(d2, Hostname + Date ~ MetricType)
## Happy again, with a supplied aggregation function
dcast(d2, Hostname + Date ~ MetricType, fun.aggregate=mean)