looping with if (the importance of vector indices) to solve something that has plagued me for a while (shifting fill dates)

This week I toyed around with some code to shift fill dates for persistence analysis, and I think I’ve stumbled on a workable solution that involves an if loop and the use of vector indices.  It’s a synthesis of a few concepts that have been bouncing around in my head for quite some time.  The detritus setting into a more well-behaved gel thanks to some help from what I’ve managed to pick up from my ongoing exploration into the Matloff text I’m nibbling on.  The problem involves the shifting of the fill date for medication fills that overlap with the fill date + days supply of a previous fill (or fills).  This intermediate step is addressed in SAS code by Scott Leslie in this SUGI paper (http://www.wuss.org/proceedings11/Papers_Chu_L_74886.pdf).  The solution I’ve come up with is probably not as eloquent as it could be (as I’m sure there are some vectorization efficiencies to be had), but nonetheless, I offer it up here:

FD is the vector of fill dates

DS is the vector of days supplies

Also, you will note that for the funciton to work for an entire data.frame of subjects you have to use something like Hadley Wickhams’ plyr to execute the fuction by member in a transform ddply fashion.


shiftday <- function(FD,DS){
x <- rep(0, length(FD))
VA <- FD
VB <- DS
for (i in 1:length(VA))
 if (i == 1)    x[i] <- VA[i]
else {    if ((((x[i-1] + VB[i-1])) - VA[i]) <= 0)
 x[i] <- VA[i]
 if ((((x[i-1] + VB[i-1])) - VA[i]) > 0)
x[i] <- ((x[i-1] + VB[i-1])) - VA[i] +VA[i]   }
x
}

Unfortnately, the text formatting of WordPress are not friendly to displaying code (the missing indentations make the reading less-than-ideal), but you get the picture.

A helpful hint about the proper use of dcast

Yet another one of Hadley Wickhams packages that I’ve come to use to get me out of a bind from time-to-time has been rehsape2.  There are a number of helpful functions that deal with munging data into formats that allow for easier display/plotting/analysis.  The problem that I run into is that I’ll use one of the reshape functions in a project, then months later, I’ll forget the reshape2 syntax for a certain function and I’ll have to go looking to find my old code or a post that gets me back on the right track.

This is where I found myself this morning, stuck with a situation that I would usually use tapply to solve, and on the account of having >1 grouping factor, I had to get from A to Z using both summaryBy (doBy) followed by a reshape function (in this case it was dcast).  After some searching around, I found this most helpful post on stack overflow that got me going in the right direction.

http://stackoverflow.com/questions/9621401/aggregation-requires-fun-aggregate-length-used-as-default

The key piece of code is found below:

library(reshape2)

## A toy dataset, with one row for each combination of variables
d <- expand.grid(Hostname = letters[1:2],
                 Date = Sys.Date() + 0:1,
                 MetricType = LETTERS[3:4])
d$Value <- rnorm(seq_len(nrow(d)))

## A second dataset, in which one combination of variables is repeated
d2 <- rbind(d, d[1,])

## Runs without complaint
dcast(d, Hostname + Date ~ MetricType)
## note-if you have greater than 1 value variable you will need to declare value.var="MYVAR" or else will default aggregate to length()

## Throws error asking for an aggregation function
dcast(d2, Hostname + Date ~ MetricType)

## Happy again, with a supplied aggregation function
dcast(d2, Hostname + Date ~ MetricType, fun.aggregate=mean)