Confronting my fea-R-s, the list-indexing edition

I don't know why I find these twins scary... but I do...

I don’t know why I find these twins scary… but I do…

Lists are very powerful data structures, but since diving into the R pool, I’ve cycled through a spectrum of emotions/opinions about the list data structure. My responses have ranged from awe, confusion, wonder, excitement and most recently, fear and frustration. The later two spring, in large part, from a lack of comfort around how to efficiently extract items that either I, or a well-meaning function (such as strsplit() or gregexpr(), or heck even lm() and just about every other statistical function), put into a list structure. Coming to R from, I’m ashamed to say (especially after reading this)… SPSS(mainly) with some SAS… I don’t have the benefit of formal object oriented programming language training and comfort with list structures that someone coming to R from Python or C++ might have.

BOTTOM LINE: While I feel quite comfortable extracting elements from data.frame structures, I’m all thumbs when it comes to lists, which is kind of ironic because the data.frame structure is a kind of list

All that said, I’m a strong believer in the power of confronting fears and addressing ones weaknesses, and today, an occasion presented itself for me to address some of my list-indexing issues. In my example I was using strsplit() to manipulate some string data. What I wanted to do was to extract the third word from a vector of strings where each element contained a name with 2+ words. After doing some hunting around I found this most helpful post that contained a solution, but also provided me with some insight on lists. Enough of that, however, here’s what I ended up doing:

V1 <- c( "Mr. Hooper's Store", "Oscar the Grouchs Can", "Big Bird's Nest" , "Maria's pad", "Elmo's Secret Hideout")
V2 <- sapply(strsplit(V1, " "), function(x) x[3]) ## the result of strsplit is a list
##  this is also insightful
strsplit(V1, " ")[[c(1,3)]]
##  to parse things out, the magic happens with sapply
V3 <- strsplit(V1, " ")
##  let's just change our request to the second word now, just for fun!
sapply(V3, function(x) x[2]) ## magic!

So, what I learned is that to extract specific elements from list structures, I’m going to have to lean on looping constructs and helpful R-ish loop-like functions (of the apply-ilk) to get what I want.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s