
(insert cool sound effect)… I think doing so while running the code also will vastly improve performance
There was a lot of really great stuff that Corey’s post encouraged me to explore, but one of the things that it required was to import some data directly from Corey’s github repository into R, and I thought I would quickly write a post on how that can be done.
Corey’s github repository can be found here, and in it there is a data folder that contains all the data required to replicate his time series forecast of bicycle collisions. Corey has also graciously provided R code for the later in the same repository.
To import the data, you will need to have the RCurl library installed.
library(RCurl) url <- 'https://raw.github.com/cjbayesian/collisions/timeseries/data/Bike%20Accidents.csv' bike.data <- getURL(url, ssl.verifypeer = FALSE) ## ssl.verifypeer is to subverte an SSL error you get otherwise d <- read.table(textConnection(bike.data), header=TRUE, sep='|', row.names='id', na.strings=' ') ## file uses bar separator
It’s as easy as that! Now if I can only get that pesky shapefile (montreal_borough_borders.shp) to play nicely with readShapePoly()! So much to learn, so little time! 🙂
##UPDATE##
OK, so I’ve revisited this as I start to implement a workflow that incorporates Git for version control and such… As I’ve done so, I wanted to post another set of example code:
url <- "https://raw.githubusercontent.com/connerpharmd/LDLCVE/master/secprevstatin.csv" statins <- getURL(url, ssl.verifypeer = FALSE) statins<- read.csv(textConnection(statins)) statins <- statins[ which(statins$Study != "AtoZ"),]
Hey, nice post. With httr, no need to do the ssl verify thing:
library(httr)
bike.data <- content(GET(url))
d <- read.table(textConnection(bike.data), header=TRUE, sep="|", row.names="id", na.strings=" ")
Nice Scott, thanks for the tip!
C
Sure thing
Scott, I’m trying to get up to speed on some basic web text scraping with R, and I noticed that a presentation on grabbing web data with R came up that you had done (through R-bloggers). To give you an idea, I want to create word clouds of press conference transcripts and compare the word clouds for the same coach post-win and post-loss. Would you know of any great web tutorials that might help me work through an example?