简体   繁体   English

如何在R中插值数据

[英]How to interpolate data in R

I am quite new to R Studio and I have a question: 我是R Studio的新手,我有一个问题:

I have the following data: (Date; Time; Value) 我有以下数据:(日期;时间;值)

02.01.11;11:00;576
02.01.11;11:05;552
02.01.11;11:10;672
02.01.11;11:15;720
02.01.11;11:20;336
02.01.11;11:25;408
02.01.11;11:30;288
02.01.11;11:35;228
02.01.11;11:40;288
02.01.11;11:45;288
02.01.11;11:50;288
02.01.11;11:55;312
02.01.11;12:00;180
02.01.11;12:05;120
02.01.11;12:10;120
02.01.11;12:15;228
02.01.11;12:20;276
02.01.11;12:25;228
02.01.11;12:30;444
02.01.11;12:35;612
02.01.11;12:40;300
02.01.11;12:45;288
02.01.11;12:50;300
02.01.11;12:55;336
02.01.11;13:00;240
02.01.11;13:05;252
02.01.11;13:10;192
02.01.11;13:15;180
02.01.11;13:20;192
02.01.11;13:25;432
02.01.11;13:30;912
02.01.11;13:35;960
02.01.11;13:40;936
02.01.11;13:45;1260
02.01.11;13:50;1008

For some calculation I need them in 1 minutes time frames. 为了进行一些计算,我需要在1分钟的时间内完成。 So can somebody help me finding out, how I can interpolate the "missing" values, so that they fit in the present ones? 因此,有人可以帮助我找出如何对“缺失”值进行插值,以使其适合当前值吗?

I used this command to get the Dataframe: 我使用以下命令获取数据框:

df <- read_delim("~/values.txt", ";", escape_double = FALSE, col_types = cols(Date = col_date(format = "%d.%m.%y"), Value = col_double(), Time = col_time(format = "%H:%M")), trim_ws = TRUE)

To deal with minute data, I would recommend using package xts and the function na.approx from package zoo . 为了处理分钟数据,我建议使用包xts和来自包zoo的函数na.approx In a nutshell, you need to create an empty vector of minute data that you will merge with your original data. 简而言之,您需要创建一个空的分钟数据载体, merge与原始数据merge Then, you can use na.approx to approximate the missing values. 然后,您可以使用na.approx来近似缺少的值。

#Intial data, not by minute    
datetime <- Sys.time()
date_time_init <- Sys.time()+c(0,3,5,8)*60
df1 <- xts(c(1:4),date_time_init)
> df1
                    [,1]
2017-06-02 03:10:20    1
2017-06-02 03:13:20    2
2017-06-02 03:15:20    3
2017-06-02 03:18:20    4

#Create time sequence by minute
date_time_complete <- seq.POSIXt(from=min(date_time_init),
                                 to=max(date_time_init),by="min") 

#Merge initial data with new time sequence
df2 <- merge(df1,xts(,date_time_complete))
                    df1
2017-06-02 03:10:20   1
2017-06-02 03:11:20  NA
2017-06-02 03:12:20  NA
2017-06-02 03:13:20   2
2017-06-02 03:14:20  NA
2017-06-02 03:15:20   3
2017-06-02 03:16:20  NA
2017-06-02 03:17:20  NA
2017-06-02 03:18:20   4

na.approx(df2)
                         df1
2017-06-02 03:07:24 1.000000
2017-06-02 03:08:24 1.333333
2017-06-02 03:09:24 1.666667
2017-06-02 03:10:24 2.000000
2017-06-02 03:11:24 2.500000
2017-06-02 03:12:24 3.000000
2017-06-02 03:13:24 3.333333
2017-06-02 03:14:24 3.666667
2017-06-02 03:15:24 4.000000

Let's assume you a) know how to read in data from a text file with semicolon delimiters. 假设您a)知道如何使用分号分隔符从文本文件中读取数据。 I would advise using stringsAsFators=FALSE . 我建议使用stringsAsFators=FALSE And b) know how to conjoin columns of text with paste . b)知道如何将文本列与paste So with a dataframe like 所以像这样的数据框

> str(dat)
'data.frame':   35 obs. of  3 variables:
 $ Dates: chr  "02.01.11" "02.01.11" "02.01.11" "02.01.11" ...
 $ Times: chr  "11:00" "11:05" "11:10" "11:15" ...
 $ Vals : int  576 552 672 720 336 408 288 228 288 288 ...

One can use the base function approxfun and give the paste()-ed Dates&Times to as.POSIXct to form the backbone of the solution: 可以使用基本函数approxfun ,并将粘贴()的日期和时间赋予as.POSIXct以形成解决方案的主干:

dat$Datetimes <- with(dat, as.POSIXct( paste(Dates,Times), format="%m.%d.%y %H:%M") )

Now create a new dataframe starting with a sequence of "minute-points" made with seq.POSIXct spanning the range of the times: 现在创建一个新的数据帧,该数据帧以用seq.POSIXct生成的“分钟点”序列开始,该序列跨越时间范围:

dat2 <- data.frame(Mins = seq(min(dat$Datetimes), max(dat$Datetimes), by="1 min") )

And then use an the expression of the form: approxfun( <inner args>)(<outer args>) to make the linear interpolations in the missing intervals. 然后使用以下形式的表达式: approxfun( <inner args>)(<outer args>)在丢失的间隔中进行线性插值。 approxfun returns a function based on the <inner-args> to which the "minute-points" are given as the <outer-args> : approxfun基于<inner-args>返回一个函数,该函数将“分钟点”指定为<outer-args>

dat2$interp <- approxfun(dat$Datetimes, dat$Vals)(dat2$Mins)
str(dat2)
#----------
'data.frame':   171 obs. of  2 variables:
 $ Mins  : POSIXct, format: "2011-02-01 11:00:00" "2011-02-01 11:01:00" ...
 $ interp: num  576 571 566 562 557 ...

To "see" the results on the same scale: 要以相同的比例“查看”结果:

with(dat, plot(Datetimes,Vals,col="red") )
with(dat2, points(Mins,interp ,cex=0.2))

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM