importing data into R directly from a Github repository

(insert cool sound effect)... I think doing so while running the code also will vastly improve performance

(insert cool sound effect)… I think doing so while running the code also will vastly improve performance

I have recently started to get serious about reviewing time series forecasting. Partly, my push into the topic is nudged along by Rob J Hyndman and George Athanasopoulos’s online text Forecasting: principles and practice. In addition, there are was a really nice contributed blog post on R-bloggers by Corey Chivers @ bayesianbilogist.com that took a stab at forecasting bicycle collision rates in Montreal based on 3 years of data.

There was a lot of really great stuff that Corey’s post encouraged me to explore, but one of the things that it required was to import some data directly from Corey’s github repository into R, and I thought I would quickly write a post on how that can be done.

Corey’s github repository can be found here, and in it there is a data folder that contains all the data required to replicate his time series forecast of bicycle collisions. Corey has also graciously provided R code for the later in the same repository.

To import the data, you will need to have the RCurl library installed.

library(RCurl)
url <- 'https://raw.github.com/cjbayesian/collisions/timeseries/data/Bike%20Accidents.csv'
bike.data <- getURL(url, ssl.verifypeer = FALSE) ##  ssl.verifypeer is to subverte an SSL error you get otherwise               
d <-  read.table(textConnection(bike.data), header=TRUE, sep='|', row.names='id', na.strings=' ') ##  file uses bar separator 

It’s as easy as that! Now if I can only get that pesky shapefile (montreal_borough_borders.shp) to play nicely with readShapePoly()! So much to learn, so little time! 🙂

##UPDATE##

OK, so I’ve revisited this as I start to implement a workflow that incorporates Git for version control and such… As I’ve done so, I wanted to post another set of example code:

url <- "https://raw.githubusercontent.com/connerpharmd/LDLCVE/master/secprevstatin.csv"
statins <- getURL(url,  ssl.verifypeer = FALSE)                
statins<- read.csv(textConnection(statins))
statins <- statins[ which(statins$Study != "AtoZ"),]
Advertisement

4 thoughts on “importing data into R directly from a Github repository

  1. Hey, nice post. With httr, no need to do the ssl verify thing:

    library(httr)
    bike.data <- content(GET(url))
    d <- read.table(textConnection(bike.data), header=TRUE, sep="|", row.names="id", na.strings=" ")

  2. Scott, I’m trying to get up to speed on some basic web text scraping with R, and I noticed that a presentation on grabbing web data with R came up that you had done (through R-bloggers). To give you an idea, I want to create word clouds of press conference transcripts and compare the word clouds for the same coach post-win and post-loss. Would you know of any great web tutorials that might help me work through an example?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s