# Curating data to eliminate partial quarters, months (of even years if that’s your game)… Step 1: Insert data, Step2: collect money… If it were only that simple…

Often, I’m pulling data in to R from datasets that are updated monthly; however, there are many cases where I am interested in aggregating data by quarters. In these cases I need to make sure that in each aggregation bin, the data represent full quarters (and full months as well–but I tacked that in an earlier post).

Most of the time I use RODBC to bring my data from our warehouse into R, and I suppose you could implement the technique below as part of the data import step, but in this particular case, I implemented the code below after the data had been imported into r.

```## simulate some data

library(zoo)
n <- 100
date <- seq(as.Date('2011-01-01'),as.Date('2012-04-30'),by = 1)
values <- rnorm(length(date), n, 25)
df1 <- data.frame(date, values)
df1\$yrqtr <- as.yearqtr(df1\$date)
df1\$yrmon <- as.yearmon(df1\$date)

tapply(df1\$values, df1\$yrqtr, sum) # beware the last quarter is incomplete
range(df1\$yrqtr) #you can't tell by looking here
range(df1\$yrmon) #but you can tell by looking here

# this code fixes your problem
test1 <- as.yearmon(as.Date(max(df1\$yrqtr), frac=1)) ## returns the last month of the last QTR that we have data for
result1 <- as.yearmon(as.Date(max(df1\$yrqtr), frac=0)) ##  returns the FIRST month ofthe last QTR what we have data for
#if max yearmon in data is not equal to what should be the last yearmon for the last qtr in data
# cut  data to last full quarter
if(max(df1\$yrmon) != test1)
df1 <- df1[ df1\$yrmon < result1 , ]

tapply(df1\$values, df1\$yrqtr, sum) # this is more like it!
range(df1\$yrqtr)
range(df1\$yrmon)

```