gdata for importing .xlsx files on 64 bit Win running 64 bit R


While in most instances it is a much better bet to convert excel data into a *.csv file and use read.csv, there are the rare times when it may actually make sense to grab the data directly out of excel.  This is the case (for example) when you have a client who sends you multiple excel files (with multiple sheets per file) that need to be imported.

There are many different packages that do this; however, I kept running into issues as certain packages are only compatible with 32 bit R (or 32 bit Windows), and I run a 64-64 set up.  My solution, the gdata package.  While it is not very fast (esp for large excel files), you can accomplish the above task with gdata’s read.xls function.  In my case the code looks like this:

DF <- read.xls(xls="~/yourpath/filename.xlsx",sheet=1)

I plan on using a loop (with assign()) to pull in data from 3 sheets in each of many xls files and create the correct names for all of them in R. A word of warning. Each of my sheets contained around 60K rows, and read.xls took quite a bit of time to run an import, so plan accordingly if you are dealing with big files. In my case, system.time() told me that my import of ~65K rows took around 3.5 minutes.

BTW, while this does not warrant a full post, I just recently discovered the function file.choose() for having R create the file path for you (if you happen to be too lazy to type in an actual file name)… or, to put my code above another way, (and if don’t care about looping) you can do something like this:

DF <- read.xls(xls=file.choose(),sheet=1)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s