Convert survival function back into survival object (hack)

I’ve been playing around with survival functions in R.  Just exploring and having fun with them.  In one case I was interested in fitting various parametric survival functions to data and I had someone at work obtain what was called internally “digitized” data that was extracted from the survival function of a published manuscript (it was a case where we could not obtain the actual raw survival data) and I wanted to play around with that digitized data to do some parametric curve fitting.

What became apparent at the outset was that many of the parametric fitting functions in R (in the flexsurv or survreg packages) require the user to create survival objects.  To convert the digitized data back into raw survival form, I had to create a hackish bit of code.  It is not perfect, but it appears to get the job done.

NOTE: I do not intend to do any analysis where I’m computing proportional hazards so the errors are NOT important to me, just the overall shapes of the survival functions and the ability to go interchangeably from digitized to raw from for survival curve fitting.


# generate the training data "lung1"
lung1 &lt;- lung[c("status", "time")]
nrow(lung1)# how many pts are in data

# from the training data build KM to obtain survival %s
s &lt;- Surv(time=lung1$time, event=lung1$status)
km.lung &lt;- survfit(s ~ 1, data=lung1)

# the kind of data you might have if you have "digitized" data
# e.g., extracted from a journal
lungv2 &lt;- data.frame(time = km.lung$time, surv = km.lung$surv)

# hackish code that converts it back
# note you have to name columns "surv" and "time"
convert.surv &lt;- function( start.df, sample){
coh.n &lt;- sample
df1 &lt;- start.df[c("surv", "time")]
for(i in 1:nrow(df1)){
j &lt;- (nrow(df1)+1) - i
if( j == nrow(df1)){
repeats &lt;- df1[j, "surv"] * coh.n
repeats &lt;- round(repeats, 0)
s1 &lt;- rep(0, repeats)
t1 &lt;- rep(df1[j, "time"], repeats)
if (j != nrow(df1) &amp; j != 1){
repeats &lt;- (df1[j-1, "surv"] - df1[j, "surv"]) * coh.n
repeats &lt;- round(repeats, 0)
if( repeats == 0 )
if( repeats &gt; 0 ){
s1 &lt;- c(s1, rep(1, repeats))
t1 &lt;- c(t1, rep(df1[j, "time"], repeats))
if (j == 1){
repeats &lt;- ((1- df1[1, "surv"]) * coh.n)
repeats &lt;- round(repeats, 0)
s1 &lt;- c(s1, rep(1, repeats))
t1 &lt;- c(t1, rep(df1[j, "time"], repeats))
s1.r &lt;- rev(s1)
t1.r &lt;- rev(t1) &lt;&lt;- data.frame(time=t1.r, status=s1.r)  # place new df in global env

#try it out
convert.surv(lungv2, 228)

# lets make sure it works properly
s &lt;- Surv($time,$status)
km.lung &lt;- survfit(s ~ 1,


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s