1.7b What are dataframes?

This originally was going to be a very long aside in 1.7
I just decided to create a new post.

What are data frames?
str(clean_data)
They are a data structure in R, one of the most important ones. Basically I tell students to think of a data frame as an excel sheet.
Except there is no whitespace (empty cells) and each column of the “excel sheet” must be the same length.

Also each column must be defined as a certain variable type.

For example clean_data$avgTemp can’t have something like … ,25,”hello”,35 .

Well I decided to check what I said.
clean_data$avgTemp[2]='hello'
str(clean_data)

This actually converts the entire vector or column to “chr” (character) type.

Now if you try adding 33.5+30 from this data R returns:
clean_data$avgTemp[1] + clean_data$avgTemp[3]
"Error in clean_data$avgTemp[1] + clean_data$avgTemp[3] :
non-numeric argument to binary operator"

Because you can’t add strings in R, this is not Python "hello"+"ma'am" does not work.

If you want to concatenate (combine) strings in R one uses the paste() function.
Note the comma seperates strings (there is no + sign here).
paste(clean_data$avgTemp[1],clean_data$avgTemp[3])

Suppose you accidently insert a string. Let’s try and convert it back.
clean_data$avgTemp[2]='hello'
## Woops, let's convert that back to num (number or decimal) type.
clean_data$avgTemp=as.numeric(clean_data$avgTemp)
### So R introduces NA when converting characters to decimal that's really nice behavior!
### Next let's try adding numbers together again.
clean_data$avgTemp[1] + clean_data$avgTemp[3]
#This returns:
63.5
# yay!

Since we don’t have very much data let’s just recompute the entire average column.
clean_data$avgTemp=((clean_data$TMAX+clean_data$TMIN)/2.0)
and now we are back to where we started.
Done, phew.

Leave a Reply