Implementing R-assigned variables as part of RODBC Select statements

I do a lot of work now that involves cutting data out of my company’s SQL database and bringing it directly into R via the RODBC package. It’s a really nice way to tidy up my workflow and by having access to all of the aggregation (and join) functions of SQL, I can do much of the heavy lifting before bringing the data into R. As you can imagine, this really beats the pants off of exporting *.csv tables out of an SQL database and then importing them in (which is what I did until I learned how to use the RODBC package).

Recently my R-RODBC workflow productivity took a bit of a quantum leap as I learned how to incorporate variables that I assign in R directly into an RODBC SQLSelect query. The problem I was confronted with was that, when I go into a database, I often want to cut a years worth (or more) of data by complete months (e.g., I do not want 1/2 a month of data). In cases where the most recent entry leaves you with less than a full month of data on the tail end (or beginning), I needed a way for the select statement to pull data up to the end of the PREVIOUS month (essentially ignoring the hanging chad of the partial month). If, on the other hand, the data just so happened to run up to the last day of a month, I needed the data pull to end on the last day we had data in the SQL database. Because my ‘last day’ was dependent upon what was in the database, I needed a way for R to: 1) Check to see what the last day (most recent entry) was in the data, 2) Evaluate it to see if that last day happened to be an end-of-month value, and 3) Feed the right value into my SQLSelect statement.

The keys to how I implemented this solution involved using the frac option from the as.yearmon function in the zoo package. Once the evaluation was complete, I used if statements to assign the correct first and last date values, and lastly I used the paste function to create my SQL statement, then I had to tidy up the statement using the strwrap function so that it could be fed to RODBCs sqlQuery function. It all goes down like this:


library(zoo)
library(RODBC)
ch <- odbcConnect("THE_DSN_YOU_CREATE")
head(sqlTables(ch), n=20) ## just a test to make sure the connection worked

lastfill <- sqlQuery(ch, "SELECT (MAX(DATE)) AS LASTFILL FROM [YOUR SQL DB]") ## pull the last date

x1 <- as.Date(lastfill[1,1])# pull last fill date as element
x2 <- as.Date(as.yearmon(x1), frac=1) ##  create element for last day of month 
x3 <- as.numeric(x2-x1) ##  test, result = 0 if lastfill goes to last day of month


if (x3 > 0){ ##  if lastfill is not at end of month
    lastyearmon <- (as.yearmon(maxyearmon, '%Y%m') - 1/12) ##  The most efficient way for me to cut the data by date is using the "YEAR MONTH" variable in our DB
    firstyearmon <- lastyearmon-(1-(1/12))
}

if (x3 == 0){  ## if last fill IS at end of a month
  lastyearmon <- (as.yearmon(maxyearmon, '%Y%m'))
  firstyearmon <- lastyearmon-(1-(1/12))
}

 

sql.select <- paste("  ## notice my use of tabs and hard returns to make the statement more legible (highly recommended), we fix these below
                    SELECT 
                    *
FROM  [YOUR SQL DB]
WHERE (YEARMONTH>= '", firstyearmon ,"' AND YEARMONTH <= '", lastyearmon, "')", sep="") ##  Simple SQL SELECT statement, but you can get really fancy here if you like.


data <- sqlQuery(ch, 
                 strwrap(sql.select, width=nchar(sql.select))) ##  use strwrap to clean up hard returns 

close(ch)

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s