I’ve been playing around with survival functions in R. Just exploring and having fun with them. In one case I was interested in fitting various parametric survival functions to data and I had someone at work obtain what was called internally “digitized” data that was extracted from the survival function of a published manuscript (it was a case where we could not obtain the actual raw survival data) and I wanted to play around with that digitized data to do some parametric curve fitting.
What became apparent at the outset was that many of the parametric fitting functions in R (in the flexsurv or survreg packages) require the user to create survival objects. To convert the digitized data back into raw survival form, I had to create a hackish bit of code. It is not perfect, but it appears to get the job done.
NOTE: I do not intend to do any analysis where I’m computing proportional hazards so the errors are NOT important to me, just the overall shapes of the survival functions and the ability to go interchangeably from digitized to raw from for survival curve fitting.
</span> require(survival) # generate the training data "lung1" lung1 <- lung[c("status", "time")] nrow(lung1)# how many pts are in data # from the training data build KM to obtain survival %s s <- Surv(time=lung1$time, event=lung1$status) km.lung <- survfit(s ~ 1, data=lung1) plot(km.lung) # the kind of data you might have if you have "digitized" data # e.g., extracted from a journal lungv2 <- data.frame(time = km.lung$time, surv = km.lung$surv) # hackish code that converts it back # note you have to name columns "surv" and "time" convert.surv <- function( start.df, sample){ coh.n <- sample df1 <- start.df[c("surv", "time")] for(i in 1:nrow(df1)){ j <- (nrow(df1)+1) - i if( j == nrow(df1)){ repeats <- df1[j, "surv"] * coh.n repeats <- round(repeats, 0) s1 <- rep(0, repeats) t1 <- rep(df1[j, "time"], repeats) } if (j != nrow(df1) & j != 1){ repeats <- (df1[j-1, "surv"] - df1[j, "surv"]) * coh.n repeats <- round(repeats, 0) if( repeats == 0 ) next if( repeats > 0 ){ s1 <- c(s1, rep(1, repeats)) t1 <- c(t1, rep(df1[j, "time"], repeats)) } } if (j == 1){ repeats <- ((1- df1[1, "surv"]) * coh.n) repeats <- round(repeats, 0) s1 <- c(s1, rep(1, repeats)) t1 <- c(t1, rep(df1[j, "time"], repeats)) } } s1.r <- rev(s1) t1.r <- rev(t1) end.data <<- data.frame(time=t1.r, status=s1.r) # place new df in global env } #try it out convert.surv(lungv2, 228) # lets make sure it works properly s <- Surv(time=end.data$time, event=end.data$status) km.lung <- survfit(s ~ 1, data=end.data) plot(km.lung)