As part of my work flow, I like to create directories in-code so that I can both: 1) document when I did/ran something and 2) ensure that all my work ends up in the same (correct) place. In the course of developing and evolving what has become my current work flow, I’ve made some mistakes along the way that I’ve learned from, and I figured it’s worth a quick post to put these down so others may learn from them as well.
Use case-So let’s say I’m doing an analysis on utilization of diabetes drugs, and seeing as how this is something that I run monthly for a particular client, I want to make sure that I’m documenting when I’m doing what and ensuring that it always ends up in the same place. How would that look?
### First I like to set a "home base" which is a place I want to return to after I'm done wd_or <- "C:/Users/myname/Documents" ## whatever your usual wd happens to be--mine is the My Documents folder ### Then build a path to the directory of choice, I like to do this in code client.name <- "coporationA" project.name <- "monthlydiabetes" today.name <- gsub("-", ".", Sys.Date()) ### Then using dir.create() build the working directory --note you can only do this one step at a time--at first ### For example, if you have not yet built the directory "corporationA" you cannot build a subdirectory within that directory ### I know this may seem elementary, but it has tripped me up in the past (e.g., it's not dir*s*.create it's dir.create) step1.dir <- paste(wd_or, Client.name, sep="/") step2.subdir <- paste(project.name, today.name, sep="_") ## step1 create director dir.create(step1.dir) ## step2 create subdirectory dir.create(paste(step1.dir, step2.subdir, sep="/")) setwd(paste(step1.dir, step2.subdir, sep="/")) ### Now run whatever you want, then be sure to save your output iris1 <- iris iris1$newcol <- "iris" iris2 <- iris iris2$newcol <- "another_iris" not_iris <- data.frame(V1=rep(1, 100), V2=rep(2, 100)) write.csv(iris1, file="iris1.csv", row.names=F) write.csv(iris2, file="iris2.csv", row.names=F) write.csv(not_iris, file="not_iris.csv") ### Then after you're done it's always nice to tidy things up setwd(wd_or)
Lastly, as a bonus, I’ve had some instances recently where, upon placing a set of files in a particular directory, I’ve had to grab only a select subset of those files and do things to those files. In a toy example, let’s say a week has passed since I created the 3 iris files above (iris1, iris2, and not_iris) and I now am required to go back into those files, append iris1 and iris2 into one file and do some analysis on the newly created big iris file.
Because I created a specific directory that contains only this work, my job is easy, and I can do it all from the console with code! 🙂
### First, you must re-set the wd (use the old code file for that) ### Take all this from the original code client.name <- "coporationA" project.name <- "monthlydiabetes" ### be sure to replace this with the date you ran the original analysis today.name <- "2015.03.10" step1.dir <- paste(wd_or, Client.name, sep="/") step2.subdir <- paste(project.name, today.name, sep="_") ### create a character vector of file names in that directory file.names.look <- list.files(paste(step1.dir, step2.subdir, sep="/")) ### select only those files that start with "iris" (e.g., you don't want "not_iris.csv") subfile.names.look <- unique(file.names.look[ grep("^iris", file.names.look)]) ### you can do this a number of ways, but here's one way that involves a looping construct list.temp.master <- list(NULL) for(i in 1:length(subfile.names.look)){ x.cols <- read.csv(paste(paste(step1.dir, step2.subdir, sep="/"), subfile.names.look[i], sep="/"), header=T, sep="|", nrow=1) x.col.classes <- rep("character" , length(x.cols))## I always declare col.classes as character because leading zeros have got me into trouble before (e.g., NDCs and SSNs)! df.import <- read.delim(paste(getwd(), subfile.names.look[i], sep="/"), header=T, sep="|", colClasses = x.col.classes) list.temp.master <- c(list.temp.master, list(df.import)) } list.temp.master <- list.temp.master[-1] ### create the newly appended big iris file all.iris <- (do.call("rbind", list.temp.master)) ### then do whatever analysis you want mean(all.iris$Sepal.Length)