简体   繁体   中英

Interpolation between 2 points for a specified datetime values

I'm new to R. I've a CSV file with the data as shown below. It has 4000+ rows of data. I'm not able to figure out how to feed the Timestamp data for Approx function. Below is my code.

library(ggplot2)
library(pmml) 
library(XML)
library(gmodels)
library(zoo)
library("data.table") 

df <- fread("C:/Users/myprofile/Desktop/test logs/test1.csv",
                  select = c("Timestamp", "Var1"))

head(df)

df[['Timestamp']] <- as.POSIXct(df[['Timestamp']],
                                  format = "%Y %m %d %H:%M:%S:%OS")

seq1 <- zoo(order.by=(as.POSIXlt(seq(min(df$Timestamp), max(df$Timestamp), by=5))))

I'm not sure how to use the Approx() function for the Timestamp data. Please help how to interpolate for "Var1" at any 2 points, for the kind of data that I have.

I get this error

seq1 <- zoo(order.by=(as.POSIXlt(seq(min(df$Timestamp), max(df$Timestamp), by=5)))) Error in seq.int(0, to0 - from, by) : 'to' must be a finite number

 dput(df)
structure(list(Timestamp = structure(c(1594146600, 1594146609, 
1594146610, 1594146612, 1594146613, 1594146614, 1594146615, 1594146616, 
1594146618, 1594146619, 1594146620, 1594146640, 1594146660, 1594146681, 
1594146701, 1594146721, 1594146741, 1594146761, 1594146782), class = c("POSIXct", 
"POSIXt"), tzone = "")), row.names = c(NA, -19L), .internal.selfref = <pointer: 0x000002aac2681ef0>, class = c("data.table", 
"data.frame"))

structure(list(Timestamp = structure(c(1594146600, 1594146609, 
1594146610, 1594146612, 1594146613, 1594146614, 1594146615, 1594146616, 
1594146618, 1594146619, 1594146620, 1594146640, 1594146660, 1594146681, 
1594146701, 1594146721, 1594146741, 1594146761, 1594146782), class = c("POSIXct", 
"POSIXt"), tzone = ""), Var1 = c(-0.02, -0.02, -0.01, 0.26, 0.48, 
0.63, 0.75, 0.86, 0.97, 1.2, 2.27, 4, 4.3, 3.02, 2.23, 1.79, 
1.62, 1.59, 1.63)), row.names = c(NA, -19L), class = "data.frame")

在此处输入图像描述

What is min(df$Timestamp) and max(df$Timestamp)? It's possible that you have NAs in your data and you have to go max(df$Timestamp, na.rm=T) . But for your sequence to work best, you should specify from, to and the time interval (you don't always know what unit 5 will default to). So: seq(from=min(df$Timestamp, na.rm=T), to=max(df$Timestamp, na.rm=T), by='5 days'))

1) Using the last dput output shown in the question (also shown in the Note at the end) we create sequential date/times, seq1 , fixing code in question and then use approx converting the resulting list to a data frame and then read that data frame into a zoo object.

library(zoo)

rng <- range(df$Timestamp)
seq1 <- seq(rng[1], rng[2], 5)

dfi <- with(df, data.frame(approx(Timestamp, Var1, seq1)))
z <- read.zoo(dfi)

2) Alternately use na.approx . seq1 is from (1) above. Here we create a zoo object from df and merge it with a zero width object with the time grid. That will introduce NAs which can be filled with na.approx . Then extract the values at the grid.

library(zoo)

zz <- read.zoo(df)
z0 <- zoo(, seq1)
na.approx(merge(zz, z0))[time(z0)]

Note

df <-
structure(list(Timestamp = structure(c(1594146600, 1594146609, 
1594146610, 1594146612, 1594146613, 1594146614, 1594146615, 1594146616, 
1594146618, 1594146619, 1594146620, 1594146640, 1594146660, 1594146681, 
1594146701, 1594146721, 1594146741, 1594146761, 1594146782), class = c("POSIXct", 
"POSIXt"), tzone = ""), Var1 = c(-0.02, -0.02, -0.01, 0.26, 0.48, 
0.63, 0.75, 0.86, 0.97, 1.2, 2.27, 4, 4.3, 3.02, 2.23, 1.79, 
1.62, 1.59, 1.63)), row.names = c(NA, -19L), class = "data.frame")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM