Time Series Data in R

Question

I have a basic understanding of R that mostly entails the ability to run regressions and summary statistics, so if there appear any gaps in my knowledge I would appreciate being pointed in the correct direction.

I have time series data in CSV that is formatted as follows:

Facility ID, Utility Type, Account No, Unit Name, Date 1, Date 2, Date 3, Date 4

There will be multiple rows for a specific account number referencing a unique utility type and facility (ie, one row entry for Unit Name = L, one row entry for Unit Name = USD). The account number values for a particular unit at every date are entered in each "date" column. I would like to be able to write a script that enables me to re-export the data where each Date column doesn't contain entries for multiple units. I would also like to then designate to R that the Date columns represent monthly time series data points, and from there do various time series analysis.

I appreciate your help in telling me how to clean up this data.

As requested, sample data:

Facility ID, Facility Name, State, Utility Type, Supplier, Account No., Unit Name, 7/1/14, 8/1/14
4015, Palm Court Apts, CA, Chilled Water, PG&E, 87993, USD, 42333, 41775
4015, Palm Court Apts, CA, Chilled Water, PG&E, 87993, ton-hr, 244278, 238035
4044, 18 Sawtelle, CA, Natural Gas, Chevron, 17965, USD, 4860, 5890
4044, 18 Sawtelle, CA, Natural Gas, Chevron, 17965, M^3, 7639, 8895

Example output:

Facility ID, Facility Name, State, Utility Type, Supplier, Account No., Quantity Consumed, Unit of Measure, Utility Bill, Currency, Date
4015, Palm Court Apts, CA, Chilled Water, PG&E, 87993, 244278, ton-hr, 42333, USD, 7/1/14
4015, Palm Court Apts, CA, Chilled Water, PG&E, 87993, 238035, ton-hr, 41775, USD, 8/1/14
4044, 18 Sawtelle, CA, Natural Gas, Chevron, 17965, 7639, M^3, 4860, USD, 7/1/14
4044, 18 Sawtelle, CA, Natural Gas, Chevron, 17965, 8895, M^3, 5890, USD, 8/1/14

Answer 1

library(reshape2)
d = read.csv("data.csv")
d.molten = melt(d, 
  id.vars=c("Facility.ID", "Facility.Name", "State", "Utility.Type", "Supplier", "Account.No.", "Unit.Name"), 
  variable.name = "Date"
)

The melt function breaks up a "wide" format (with an undefined numbers of columns) to a "long" format, where each row is an observation. This is actually the preferred format for most things you'd do in R, at least when using packages from the "Hadleyverse" . Especially for time series.

But we're not done yet. Now you have the following structure:

Facility.ID    Facility.Name …  Date  value
       4015  Palm Court Apts X7.1.14  42333

We have to fix the dates that are currently just "strings". They had an "X" prepended since column names cannot start with a number, and cannot contain spaces.

d.molten$Date=as.Date(d.molten$Date, "X%m.%d.%y")

Now your dates will look correct, and you have one row for each observation:

Facility.ID    Facility.Name …     Date  value
       4015  Palm Court Apts 2014-07-01  42333

And now we can easily plot time series:

library(ggplot2)
ggplot(d.molten, 
  aes(x = Date, y = value, color = Facility.Name)) + 
  geom_point()

Time Series Data in R

Question

1 answers

solution1
0 ACCPTED 2015-02-12 21:19:22

Time Series Data in R

Question

1 answers

solution1 0 ACCPTED 2015-02-12 21:19:22

solution1
0 ACCPTED 2015-02-12 21:19:22