Cleaning up detailed D.0 compound claim files–many columns of data into one cell

compound_medicinesOne of the burning client issues that I find I am focused on–as of late–is a disturbing increase in the proportion of very high cost compounded drug claims. While I am not going to spend any time debating the clinical value of these claims, I think anyone would agree that a good first step in forming an evidence-based approach to developing an opinion on these claims is to examine them (How many do you see? What do they cost? Does the costing basis make sense to you? What ingredients go into them? etc.,).

In the new D.0 claim format, compounding pharmacies are given the ‘opportunity’ to list each individual ingredient used to make up the compounded product. So one of the things you may want to do is take all of the line-by-line ingredients that make up a claim and create data where each set of ingredients are aggregated into one line. This is a helpful chunk of code that does just that. It makes use of both Hadley’s dcast() function–part of the reshape2 package, and the apply() function (one of a very handy set of _apply functions in R).

#Make some data
ingredient_name <- c("Drug A Phosphate Cap", "Flavor", "*Distilled Water*", 
"Drug X Inj Susp", "Lidocaine", 
"Super Drug HCl Liquid", "Not that great Inj Susp", 
"Antibiotic HCl Cap", "Drug A HCl Liquid", "Antifungal (Bulk)", 
"Table Salt Bicarbonate (Bulk)", "Antifungal (Bulk)", "Table Salt Bicarbonate (Bulk)", 
"Drug A Phosphate Cap", "Flavors", "*Distilled Water*", 
"Drug A Phosphate Cap", "Antifungal (Bulk)", "Table Salt Bicarbonate (Bulk)", 
"Super Drufg Acetonide Inj Susp", "Emollient**", 
"Antifungal (Bulk)", "Table Salt Bicarbonate (Bulk)")
 claim_id <- c(1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 6, 6, 6, 7, 8, 8, 9, 
9, 10, 10)
df1 <- data.frame(claim_id, ingredient_name)
#Let's look at what you have
df1
df2 <- dcast(df1, claim_id~ingredient_name)
#NOTE: fantastic side effect of dcast is that it alphabetizes columns by column name
#This ensures that for different claims the ordering of the ingredients follows the same rules!
cols <- names(df2)[-1]
df2$all_ingred <- apply( df2[ , cols ] , 1 , paste , collapse = ", " ) # combine all the columns into one
df2$all_ingred<- gsub("NA, |NA", "", df2$all_ingred)#clean up
df2$all_ingred<- gsub(", $", "", df2$all_ingred)#more clean up
df3 <- df2[c("claim_id", "all_ingred")]#even more clean up
#Now give it a look
df3
Advertisements