简体   繁体   中英

Manipulating zoo object column after imputation

I have a large hourly time series data set showing temperatures at different times. There were a number of missing values (NA) in the series so I used linear interpolation to impute the missing values using the imputeTS package. Before the interpolation I was told to create a column for the imputed values as a zoo object. This replaced any NA temperatures with imputed ones.

I am doing heating degree day analysis which is the heating required to heat a building to room temperature. If the outside temperature is below 15.5 degrees then heating is required. I am looking to ignore (or set to NA) values above 15.5 and only focus on the temperatures below. I then would like to calculate the heating degree days which would be (15.5-Temp)*1/24 (24 hours in a day). This is usually simple however I am having trouble with the zoo object. Can anyone help??

An example of the data is:

DateTimes <- as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00", "2009-01-01 02:00:00", "2009-01-01 03:00:00", "2009-01-01 04:00:00", "2009-01-01 05:00:00", "2009-01-01 06:00:00"))
MeanTemp <- c(0.8, 0.7, 0.7, NA, 0.8, 0.9, 1.1)

HourTemp <- data.frame(DateTimes, MeanTemp) 

These are my imputation steps:

#Use linear interpolation to impute missing values
TempImp <- zoo(HourTemp$MeanTemp, HourTemp$DateTimes)
TempImp <- imputeTS::na.interpolation(TempImp, option = "linear")
#Add imputed values to data
as.data.frame(HourTemp)
HourTemp$airTempImp <- round(TempImp,1)
#Add imputed flag
HourTemp$Imputed <- ifelse(is.na(HourTemp$MeanTemp), "Imputed", "Observed")
HourTemp

The imputations worked successfully, replacing NA values with estimates but I cannot manipulate the zoo object 'airTempImp' to create a heating degree days column as specified in the opening paragraph.

I have tried using ifelse, ifelse.zoo, transform but none seem to be working!

Thanks!

It sounds like you haven't converted the zoo object to a more generic R object (but you haven't given an error message or code that produces it, so I can't be 100% sure).

In that case, you can use the as.vector function (see https://www.rdocumentation.org/packages/zoo/versions/1.8-6/topics/as.zoo ), to convert a zoo object into a vector , which you can add to a data.frame.

The example code below removes imputeTS , like what G. Grothendieck says in his comment, since zoo's na.approx does linear interpolation.

# install.packages("zoo")
library("zoo")

DateTimes <- as.POSIXct(c(
  "2009-01-01 00:00:00", "2009-01-01 01:00:00",
  "2009-01-01 02:00:00", "2009-01-01 03:00:00",
  "2009-01-01 04:00:00", "2009-01-01 05:00:00", "2009-01-01 06:00:00"))
MeanTemp <- c(0.8, 0.7, 0.7, NA, 0.8, 0.9, 1.1)
HourTemp <- data.frame(DateTimes, MeanTemp)
TempImp <- zoo(HourTemp$MeanTemp, HourTemp$DateTimes)

# use zoo's linear interpolation
HourTemp$airTempImp <- as.vector(na.approx(TempImp))
HourTemp$Imputed <- ifelse(is.na(HourTemp$MeanTemp), "Imputed", "Observed")

# calculates the heating degree day per hour if temp > 15.5,
# else sets to 0 (no heating)
HourTemp$HeatingDegreeDay <- ifelse(
  HourTemp$airTempImp > 15.5,
  0, # no heating
  (15.5 - HourTemp$airTempImp) / 24
)

which will output:

HourTemp
            DateTimes MeanTemp airTempImp  Imputed HeatingDegreeDay
1 2009-01-01 00:00:00      0.8       0.80 Observed        0.6125000
2 2009-01-01 01:00:00      0.7       0.70 Observed        0.6166667
3 2009-01-01 02:00:00      0.7       0.70 Observed        0.6166667
4 2009-01-01 03:00:00       NA       0.75  Imputed        0.6145833
5 2009-01-01 04:00:00      0.8       0.80 Observed        0.6125000
6 2009-01-01 05:00:00      0.9       0.90 Observed        0.6083333
7 2009-01-01 06:00:00      1.1       1.10 Observed        0.6000000

Your solution is too complicated - since you anyway seem to want to have a data.frame you do not need to convert your data to a zoo object.

Just apply na_interpolation from imputeTS directly on the data.frame (imputeTS can deal with all kinds of inputs eg data.frame , vector , zoo , ts , xts , tibble , tsibble )

It's just:

library(imputeTS)
DateTimes <- as.POSIXct(c("2009-01-01 00:00:00", "2009-01-01 01:00:00", 
  "2009-01-01 02:00:00", "2009-01-01 03:00:00", "2009-01-01 04:00:00",
  "2009-01-01 05:00:00", "2009-01-01 06:00:00"))

MeanTemp <- c(0.8, 0.7, 0.7, NA, 0.8, 0.9, 1.1)
HourTemp <- data.frame(DateTimes, MeanTemp)

Imputed <- imputeTS::na.interpolation(HourTemp, option = "linear")

imputeTS will just ignore the date column in this case and fill the data column

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM