importing data into R directly from a Github repository

(insert cool sound effect)… I think doing so while running the code also will vastly improve performance

I have recently started to get serious about reviewing time series forecasting. Partly, my push into the topic is nudged along by Rob J Hyndman and George Athanasopoulos’s online text Forecasting: principles and practice. In addition, there are was a really nice contributed blog post on R-bloggers by Corey Chivers @ bayesianbilogist.com that took a stab at forecasting bicycle collision rates in Montreal based on 3 years of data.

There was a lot of really great stuff that Corey’s post encouraged me to explore, but one of the things that it required was to import some data directly from Corey’s github repository into R, and I thought I would quickly write a post on how that can be done.

Corey’s github repository can be found here, and in it there is a data folder that contains all the data required to replicate his time series forecast of bicycle collisions. Corey has also graciously provided R code for the later in the same repository.

To import the data, you will need to have the RCurl library installed.

library(RCurl)
url <- 'https://raw.github.com/cjbayesian/collisions/timeseries/data/Bike%20Accidents.csv'
bike.data <- getURL(url, ssl.verifypeer = FALSE) ##  ssl.verifypeer is to subverte an SSL error you get otherwise               
d <-  read.table(textConnection(bike.data), header=TRUE, sep='|', row.names='id', na.strings=' ') ##  file uses bar separator

It’s as easy as that! Now if I can only get that pesky shapefile (montreal_borough_borders.shp) to play nicely with readShapePoly()! So much to learn, so little time! 🙂

##UPDATE##

OK, so I’ve revisited this as I start to implement a workflow that incorporates Git for version control and such… As I’ve done so, I wanted to post another set of example code:

url <- "https://raw.githubusercontent.com/connerpharmd/LDLCVE/master/secprevstatin.csv"
statins <- getURL(url,  ssl.verifypeer = FALSE)                
statins<- read.csv(textConnection(statins))
statins <- statins[ which(statins$Study != "AtoZ"),]

4 thoughts on “importing data into R directly from a Github repository”

Scott Chamberlain on December 6, 2013 at 8:08 pm said:

Hey, nice post. With httr, no need to do the ssl verify thing:

library(httr)
bike.data <- content(GET(url))
d <- read.table(textConnection(bike.data), header=TRUE, sep="|", row.names="id", na.strings=" ")

Reply ↓
- connerpharmd on December 11, 2013 at 6:21 pm said:
  
  Nice Scott, thanks for the tip!
  
  C
  
  Reply ↓
  - Scott Chamberlain on December 11, 2013 at 11:30 pm said:
    
    Sure thing
connerpharmd on December 14, 2013 at 6:05 pm said:

Scott, I’m trying to get up to speed on some basic web text scraping with R, and I noticed that a presentation on grabbing web data with R came up that you had done (through R-bloggers). To give you an idea, I want to create word clouds of press conference transcripts and compare the word clouds for the same coach post-win and post-loss. Would you know of any great web tutorials that might help me work through an example?

Reply ↓