March | 2015 | failuretoconverge

Welcome to document management 2.0

As part of my work flow, I like to create directories in-code so that I can both: 1) document when I did/ran something and 2) ensure that all my work ends up in the same (correct) place. In the course of developing and evolving what has become my current work flow, I’ve made some mistakes along the way that I’ve learned from, and I figured it’s worth a quick post to put these down so others may learn from them as well.

Use case-So let’s say I’m doing an analysis on utilization of diabetes drugs, and seeing as how this is something that I run monthly for a particular client, I want to make sure that I’m documenting when I’m doing what and ensuring that it always ends up in the same place. How would that look?


###    First I like to set a "home base" which is a place I want to return to after I'm done
wd_or <- "C:/Users/myname/Documents" ## whatever your usual wd happens to be--mine is the My Documents folder

###    Then build a path to the directory of choice, I like to do this in code 
client.name <- "coporationA"
project.name <- "monthlydiabetes"
today.name <- gsub("-", ".", Sys.Date())
###    Then using dir.create() build the working directory --note you can only do this one step at a time--at first
###    For example, if you have not yet built the directory "corporationA" you cannot build a subdirectory within that directory
###    I know this may seem elementary, but it has tripped me up in the past (e.g., it's not dir*s*.create it's dir.create)

step1.dir <- paste(wd_or, Client.name, sep="/")
step2.subdir <- paste(project.name, today.name, sep="_")
##  step1 create director
dir.create(step1.dir)
##  step2 create subdirectory
dir.create(paste(step1.dir, step2.subdir, sep="/"))
setwd(paste(step1.dir, step2.subdir, sep="/"))

###    Now run whatever you want, then be sure to save your output
iris1 <- iris
iris1$newcol <- "iris"

iris2 <- iris
iris2$newcol <- "another_iris"


not_iris <- data.frame(V1=rep(1, 100), V2=rep(2, 100))

write.csv(iris1, file="iris1.csv", row.names=F)
write.csv(iris2, file="iris2.csv", row.names=F)
write.csv(not_iris, file="not_iris.csv")

###    Then after you're done it's always nice to tidy things up
setwd(wd_or)

Lastly, as a bonus, I’ve had some instances recently where, upon placing a set of files in a particular directory, I’ve had to grab only a select subset of those files and do things to those files. In a toy example, let’s say a week has passed since I created the 3 iris files above (iris1, iris2, and not_iris) and I now am required to go back into those files, append iris1 and iris2 into one file and do some analysis on the newly created big iris file.

Because I created a specific directory that contains only this work, my job is easy, and I can do it all from the console with code! 🙂


###    First, you must re-set the wd (use the old code file for that)

###    Take all this from the original code 
client.name <- "coporationA"
project.name <- "monthlydiabetes"

###    be sure to replace this with the date you ran the original analysis
today.name <- "2015.03.10"
step1.dir <- paste(wd_or, Client.name, sep="/")
step2.subdir <- paste(project.name, today.name, sep="_")

###    create a character vector of file names in that directory
file.names.look <- list.files(paste(step1.dir, step2.subdir, sep="/"))

###    select only those files that start with "iris" (e.g., you don't want "not_iris.csv")
subfile.names.look <- unique(file.names.look[ grep("^iris", file.names.look)])

###    you can do this a number of ways, but here's one way that involves a looping construct
list.temp.master <- list(NULL)
for(i in 1:length(subfile.names.look)){
    x.cols <- read.csv(paste(paste(step1.dir, step2.subdir, sep="/"), subfile.names.look[i], sep="/"), header=T, sep="|", nrow=1)
    x.col.classes <- rep("character" , length(x.cols))## I always declare col.classes as character because leading zeros have got me into trouble before (e.g., NDCs and SSNs)!
    df.import <-  read.delim(paste(getwd(), subfile.names.look[i], sep="/"), header=T, sep="|", colClasses = x.col.classes)
    list.temp.master <- c(list.temp.master, list(df.import))
}
list.temp.master <- list.temp.master[-1]

### create the newly appended big iris file
all.iris <- (do.call("rbind", list.temp.master))

### then do whatever analysis you want
mean(all.iris$Sepal.Length)

failuretoconverge

father, husband, son, brother, data hacker, seeking optimization

Monthly Archives: March 2015

Changing and setting working directories in code