Comment characters and importing data with read.xxx functions in R… Lost in Translation?

Sometimes google translate just doesn’t do it for you

Work has been busy, so I’ve had to take a quick break from blogging, but I recently stubbed my toe on something I wanted to quickly pen a post about.

I was importing some MediSpan data (HCPCS Code file) for a claims analysis and I was having an issue with my import script throwing an error that looked something like this:

NOTE: I’ll have to apologize in advance. I’m having an issue with html translating my quotes into directional quotes. I’m sure it’s more user error than anything, but be warned that you may have to change these back before your code will work in R.

xnames <- tolower(gsub(' |-’, ‘\\.’, str_trim((hcpdict[ grep(‘^M’, hcpdict$field.identifier), ‘field.description’]))))
xwidth <- (hcpdict[ grep(‘^M’, hcpdict$field.identifier), ‘field.length’])
xcolclasses <- (hcpdict[ grep(‘^M’, hcpdict$field.identifier), ‘field.type’])
xcolclasses <- ifelse(xcolclasses == 'C', ‘character’, xcolclasses)
xcolclasses <- ifelse(xcolclasses == 'N', ‘numeric’, xcolclasses)
hcpcode <- read.fwf(‘C:\\Users\\Chris.Conner\\Documents\\CI\\Lego_pie\\rawdat\\DIDB\\HCPCS\\HCPCODE’, widths = xwidth, col.names = xnames, colClasses=xcolclasses)


# Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings,  : 
# line 2739 did not have 19 elements

A quick inspection of the raw file, and I surmised that it was a pound symbol “#” throwing things off. Essentially, R reserves this symbol for comments, even within files that you are importing with any of the read.xxx functions.

A quick search uncovered this excellent post that showed me the way out of my jam. I appended my code and viola(!) all fixed.

xnames <- tolower(gsub(‘ |-’, ‘\\.’, str_trim((hcpdict[ grep(‘^M’, hcpdict$field.identifier), ‘field.description’]))))
xwidth <- (hcpdict[ grep(‘^M’, hcpdict$field.identifier), ‘field.length’])
xcolclasses <- (hcpdict[ grep(‘^M’, hcpdict$field.identifier), ‘field.type’])
xcolclasses <- ifelse(xcolclasses == 'C', ‘character’, xcolclasses)
xcolclasses <- ifelse(xcolclasses == 'N', ‘numeric’, xcolclasses)
hcpcode <- read.fwf(‘C:\\Users\\Chris.Conner\\Documents\\CI\\Lego_pie\\rawdat\\DIDB\\HCPCS\\HCPCODE’, widths = xwidth, col.names = xnames, colClasses=xcolclasses, , comment.char=‘‘)

failuretoconverge

father, husband, son, brother, data hacker, seeking optimization

Comment characters and importing data with read.xxx functions in R… Lost in Translation?

Leave a comment Cancel reply

Share this:

Related

Leave a comment Cancel reply