0. Getting Started

So I decided for an R example I would just walk through a statistical analysis for some publicly available data.
For my master’s research I have analyzed a lot of data for the Champaign Urbana area so let’s mix it up and for fun analyze data from my hometown, Galesburg Il.

Where can one get weather data?
Initially I was going to use data from the Illinois State Water Survey site below but it is only monthly data from 1948 – 2014.
Galesburg IL Illinois State Water Survey Climate Data

My next source to consider was the NCDC and yes they have daily summary data available for Galesburg. Specifically I used the search tool on this page (the big blue box) NCDC Climate Data Online Web. Anyway I saved you the trouble of waiting for a download link and have posted the data here. I also included the clean data further down.
Raw NCDC Galesburg Daily Wx Data
Metadata for Raw Data

I cleaned up the data using the R code below.

#source('clean_data.R')
raw_data=read.csv("ncdc_galesburg_daily_climate.csv",header=T,row.names=NULL,stringsAsFactors=FALSE)
### first as.character() converts the integers to text
### then as.Date() converts this data to R Date objects.
raw_data$Date=as.Date(as.character(raw_data$DATE),format='%Y%m%d')

### select only the data we want
raw_data=raw_data[c('Date','TMAX','TMIN','PRCP')]
### sort the dataframe by dates
raw_data=raw_data[with(raw_data, order(raw_data$Date)), ]

raw_data$TMAX[raw_data$TMAX==-9999]=NA
raw_data$TMIN[raw_data$TMIN==-9999]=NA

### the data is in 10ths so divide by 10.0
### example (tenths of degrees C)
raw_data$TMAX=raw_data$TMAX/10.0
raw_data$TMIN=raw_data$TMIN/10.0
raw_data$PRCP=raw_data$PRCP/10.0

### convert Celsius to Fahrenheit
raw_data$TMAX=raw_data$TMAX*(9.0/5.0)+32.0
raw_data$TMIN=raw_data$TMIN*(9.0/5.0)+32.0

### write the data out into csv for future usage
write.csv(raw_data,file='ncdc_galesburg_daily_clean.csv',row.names = F)

Clean NCDC Galesburg Daily Wx Data.

I focus on being able to repoduce my results on demand so I never use R history at this point, I just create R scripts that I run using ” source(‘filename.R’) ” without double quotes. If you are intersted in another way of saving data to and from R, I really enjoy this blog about using saveRDS and readRDS (R binary format). http://www.fromthebottomoftheheap.net/2012/04/01/saving-and-loading-r-objects

I’m not going to go over installing R it should be as easy as downloading an .exe and running it, for your specific operating system. On Windows and Mac using R will involve using a GUI program and the trickiest thing to remember initially is to change the working directory to point to the data.

Next!

Leave a Reply