[英]How to interpolate data in R
I am quite new to R Studio and I have a question: 我是R Studio的新手,我有一个问题:
I have the following data: (Date; Time; Value) 我有以下数据:(日期;时间;值)
02.01.11;11:00;576
02.01.11;11:05;552
02.01.11;11:10;672
02.01.11;11:15;720
02.01.11;11:20;336
02.01.11;11:25;408
02.01.11;11:30;288
02.01.11;11:35;228
02.01.11;11:40;288
02.01.11;11:45;288
02.01.11;11:50;288
02.01.11;11:55;312
02.01.11;12:00;180
02.01.11;12:05;120
02.01.11;12:10;120
02.01.11;12:15;228
02.01.11;12:20;276
02.01.11;12:25;228
02.01.11;12:30;444
02.01.11;12:35;612
02.01.11;12:40;300
02.01.11;12:45;288
02.01.11;12:50;300
02.01.11;12:55;336
02.01.11;13:00;240
02.01.11;13:05;252
02.01.11;13:10;192
02.01.11;13:15;180
02.01.11;13:20;192
02.01.11;13:25;432
02.01.11;13:30;912
02.01.11;13:35;960
02.01.11;13:40;936
02.01.11;13:45;1260
02.01.11;13:50;1008
For some calculation I need them in 1 minutes time frames. 为了进行一些计算,我需要在1分钟的时间内完成。 So can somebody help me finding out, how I can interpolate the "missing" values, so that they fit in the present ones?
因此,有人可以帮助我找出如何对“缺失”值进行插值,以使其适合当前值吗?
I used this command to get the Dataframe: 我使用以下命令获取数据框:
df <- read_delim("~/values.txt", ";", escape_double = FALSE, col_types = cols(Date = col_date(format = "%d.%m.%y"), Value = col_double(), Time = col_time(format = "%H:%M")), trim_ws = TRUE)
To deal with minute data, I would recommend using package xts
and the function na.approx
from package zoo
. 为了处理分钟数据,我建议使用包
xts
和来自包zoo
的函数na.approx
。 In a nutshell, you need to create an empty vector of minute data that you will merge
with your original data. 简而言之,您需要创建一个空的分钟数据载体,
merge
与原始数据merge
。 Then, you can use na.approx
to approximate the missing values. 然后,您可以使用
na.approx
来近似缺少的值。
#Intial data, not by minute
datetime <- Sys.time()
date_time_init <- Sys.time()+c(0,3,5,8)*60
df1 <- xts(c(1:4),date_time_init)
> df1
[,1]
2017-06-02 03:10:20 1
2017-06-02 03:13:20 2
2017-06-02 03:15:20 3
2017-06-02 03:18:20 4
#Create time sequence by minute
date_time_complete <- seq.POSIXt(from=min(date_time_init),
to=max(date_time_init),by="min")
#Merge initial data with new time sequence
df2 <- merge(df1,xts(,date_time_complete))
df1
2017-06-02 03:10:20 1
2017-06-02 03:11:20 NA
2017-06-02 03:12:20 NA
2017-06-02 03:13:20 2
2017-06-02 03:14:20 NA
2017-06-02 03:15:20 3
2017-06-02 03:16:20 NA
2017-06-02 03:17:20 NA
2017-06-02 03:18:20 4
na.approx(df2)
df1
2017-06-02 03:07:24 1.000000
2017-06-02 03:08:24 1.333333
2017-06-02 03:09:24 1.666667
2017-06-02 03:10:24 2.000000
2017-06-02 03:11:24 2.500000
2017-06-02 03:12:24 3.000000
2017-06-02 03:13:24 3.333333
2017-06-02 03:14:24 3.666667
2017-06-02 03:15:24 4.000000
Let's assume you a) know how to read in data from a text file with semicolon delimiters. 假设您a)知道如何使用分号分隔符从文本文件中读取数据。 I would advise using
stringsAsFators=FALSE
. 我建议使用
stringsAsFators=FALSE
。 And b) know how to conjoin columns of text with paste
. b)知道如何将文本列与
paste
。 So with a dataframe like 所以像这样的数据框
> str(dat)
'data.frame': 35 obs. of 3 variables:
$ Dates: chr "02.01.11" "02.01.11" "02.01.11" "02.01.11" ...
$ Times: chr "11:00" "11:05" "11:10" "11:15" ...
$ Vals : int 576 552 672 720 336 408 288 228 288 288 ...
One can use the base function approxfun
and give the paste()-ed Dates&Times to as.POSIXct
to form the backbone of the solution: 可以使用基本函数
approxfun
,并将粘贴()的日期和时间赋予as.POSIXct
以形成解决方案的主干:
dat$Datetimes <- with(dat, as.POSIXct( paste(Dates,Times), format="%m.%d.%y %H:%M") )
Now create a new dataframe starting with a sequence of "minute-points" made with seq.POSIXct
spanning the range of the times: 现在创建一个新的数据帧,该数据帧以用
seq.POSIXct
生成的“分钟点”序列开始,该序列跨越时间范围:
dat2 <- data.frame(Mins = seq(min(dat$Datetimes), max(dat$Datetimes), by="1 min") )
And then use an the expression of the form: approxfun( <inner args>)(<outer args>)
to make the linear interpolations in the missing intervals. 然后使用以下形式的表达式:
approxfun( <inner args>)(<outer args>)
在丢失的间隔中进行线性插值。 approxfun
returns a function based on the <inner-args>
to which the "minute-points" are given as the <outer-args>
: approxfun
基于<inner-args>
返回一个函数,该函数将“分钟点”指定为<outer-args>
:
dat2$interp <- approxfun(dat$Datetimes, dat$Vals)(dat2$Mins)
str(dat2)
#----------
'data.frame': 171 obs. of 2 variables:
$ Mins : POSIXct, format: "2011-02-01 11:00:00" "2011-02-01 11:01:00" ...
$ interp: num 576 571 566 562 557 ...
To "see" the results on the same scale: 要以相同的比例“查看”结果:
with(dat, plot(Datetimes,Vals,col="red") )
with(dat2, points(Mins,interp ,cex=0.2))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.