As someone who was trained in the point-and-click ways of SPSS, and is very much still new to R data programming, looping concepts in general and more specifically in R have always been something I’ve approached with equal parts fear and anxiety. That said, I must also add that I have become very much aware that an aversion to looping constructs, and skill at substituting vectorized code for such, is something that any seasoned R programmer develops a proficiency for. Even still, I feel that it is an important part of my development as a data programmer, that I understand loops and more importantly rid myself of the irrational fear I have of them.
Just in case any others out there may come from the same place as I have, I wanted to take an example loop that I recently came across as I was reading Matloff’s The Art of R Programing (http://www.amazon.com/The-Art-Programming-Statistical-Software/dp/1593273843) and walk through it (if you haven’t yet purchased the text, I would highly recommend you do so). I will say that Matloff does briefly walk through the code in the text; however, I found that I needed to be spoon fed more directly what the for loop was actually doing. Lastly, I want to make it clear that I am not a programmer by training, so forgive any improper usage of nomenclature(you have been warned). First here is the entire piece of code (the dataset can be obtained from the publishers website):
aba <- read.csv(“~/R_folder/art_of_r_prog_suppl/artofr_data/Abalone.data”,header=T,as.is=T)
names(aba) <- c(“Gender”,”Length”,”Diameter”,”Height”,”WholeWt”,”ShuckedWt”,”ViscWt”,”ShellWt”,”Rings”)
grps <- list()
for (gen in c(“M”,”F”)) grps[[gen]] <- which(aba[,1]==gen)
abam <- aba[grps$M,]
abaf <- aba[grps$F,]
plot(abam$Length,abam$Diameter) plot(abaf$Length,abaf$Diameter,pch=”x”,new=FALSE)
Let’s just focus on the piece of code that contains the actual for loop:
for (gen in c(“M”,”F”)) grps[[gen]] <- which(aba[,1]==gen)
1) Let’s talk about ( for (gen in c(“M”,”F”)) ). For a novice like me, I found it pretty cool that R is looping here over a character vector ( c(“M”, “F”)), so essentially what the for loop is doing here is it assigns an object gen and loops gen over 2 values “M” and “F”. So, spoon feeding this, gen takes on two values over the course of the loop, first “M” then “F”.
2) Next, we are describing what we want the for loop to do, as it is, well, looping. That is what we see in the next statement. Here we see that we are giving the loop the list grps and within grps we are asking the loop to fill the list with stuff (that stuff being 2 vectors of indices that will be generated by the which() function). What is cool about this is that we are taking advantage of the character vector we are looping over to create the names of the vectors in the list, specifically “M”, then “F”. We use list indexing with the object gen (grps[[gen]]) to create as many items in the list as there are values we are looping over (in this case 2).
3) Lastly, and really sequentially following step 2 above, the loop fills the list item you just created with stuff: a vector of indices.
VIOLA! That’s it.
The only thing that I’ll have to add is that after studying the code, I wonder why Matloff just didn’t do it this way:
abaM <- aba[aba$Gender == “M”] ## and then
abaF <- aba[aba$Gender == “F”]
avoiding the loop entirely?