Today, I came across a post from the ‘What you’re doing is rather desperate’ blog that dealt with a common issue, and something that I deal with on (almost) a daily basis. It is, in fact, so common an issue that I have a script that does all the work for me and it was good diving back in for a refresh of something I wrote quite a bit ago.
N.Saunders posts a much cleaner solution than mine, but mine avoids this issues that can arise when you have non-unique values as maximums (or minimums). Plus my solution avoids the use of the merge() function which, in my experience can sometimes be a memory and time hog. See below for my take on solving his issue.
## First lets create some data (and inject some gremlins) df.orig <- data.frame(vars = rep(LETTERS[1:5], 2), obs1 = c(1:10), obs2 = c(11:20)) df.orig <- rbind(df.orig, data.frame(vars = 'A', obs1 = '6', obs2 = '15')) ## create some ties df.orig <- rbind(df.orig, data.frame(vars = 'A', obs1 = '6', obs2 = '16')) ## more ties df.orig <- df.orig[order(df.orig$vars, df.orig$obs1, df.orig$obs2),] ## my solution requires that you order your data first row.names(df.orig) <- seq(1,nrow(df.orig)) ## since the row.names get scrambled by the order() function we need to re-establish some neatness x1 <- match(df.orig$vars, df.orig$vars) index <- as.numeric(tapply(row.names(df.orig), x1, FUN=tail, n=1)) ## here's where the magic happens df.max <- df.orig[index,]
Pingback: Using data.table to make quick work of limiting a dataset by the first observation of a value by subject… | failuretoconverge