November | 2012 | failuretoconverge

shiftday <- function(FD,DS){ x <- rep(0, length(FD)) VA <- FD VB <- DS for (i in 1:length(VA)) if (i == 1) x[i] <- VA[i] else { if ((((x[i-1] + VB[i-1])) - VA[i]) <= 0) x[i] <- VA[i] if ((((x[i-1] + VB[i-1])) - VA[i]) > 0) x[i] <- ((x[i-1] + VB[i-1])) - VA[i] +VA[i] } x }

Yet another one of Hadley Wickhams packages that I’ve come to use to get me out of a bind from time-to-time has been rehsape2. There are a number of helpful functions that deal with munging data into formats that allow for easier display/plotting/analysis. The problem that I run into is that I’ll use one of the reshape functions in a project, then months later, I’ll forget the reshape2 syntax for a certain function and I’ll have to go looking to find my old code or a post that gets me back on the right track.

This is where I found myself this morning, stuck with a situation that I would usually use tapply to solve, and on the account of having >1 grouping factor, I had to get from A to Z using both summaryBy (doBy) followed by a reshape function (in this case it was dcast). After some searching around, I found this most helpful post on stack overflow that got me going in the right direction.

http://stackoverflow.com/questions/9621401/aggregation-requires-fun-aggregate-length-used-as-default

The key piece of code is found below:

library(reshape2)

## A toy dataset, with one row for each combination of variables
d <- expand.grid(Hostname = letters[1:2],
                 Date = Sys.Date() + 0:1,
                 MetricType = LETTERS[3:4])
d$Value <- rnorm(seq_len(nrow(d)))

## A second dataset, in which one combination of variables is repeated
d2 <- rbind(d, d[1,])

## Runs without complaint
dcast(d, Hostname + Date ~ MetricType)
## note-if you have greater than 1 value variable you will need to declare value.var="MYVAR" or else will default aggregate to length()

## Throws error asking for an aggregation function
dcast(d2, Hostname + Date ~ MetricType)

## Happy again, with a supplied aggregation function
dcast(d2, Hostname + Date ~ MetricType, fun.aggregate=mean)

failuretoconverge

father, husband, son, brother, data hacker, seeking optimization

Monthly Archives: November 2012

looping with if (the importance of vector indices) to solve something that has plagued me for a while (shifting fill dates)

A helpful hint about the proper use of dcast