[英]linear interpolation in time series in R
I have a data set with body temperatures taken every minute for 8 hours. 我有一个数据集,其中每分钟要采集8小时的体温。 I removed aberrant data and now have NA values, sometimes just one alone, and sometimes more then 10 in a row.
我删除了异常数据,现在有了NA值,有时只有一个,有时连续超过10个。 I would like to replace the missing data using linear interpolation.
我想使用线性插值替换丢失的数据。
I tried different things but I could'nt make 'approx' to work (NA values stayed NA...) or even find a way to specify to R to use the value before (same column, minus 1 row) or the value after (same column, + 1 row). 我尝试了不同的方法,但是我无法使'approx'起作用(NA值保持为NA ...),甚至找不到一种方法指定R使用之前的值(同一列,减去1行)或之后的值(同一列,+ 1行)。 in this examples, where I try to replace just one NA, the [+1] and [-1] are just read as [1], so it doesn't work
在此示例中,我尝试仅替换一个NA,[+ 1]和[-1]读为[1],所以它不起作用
df$var1_lini <- ifelse (!is.na(df$var1),df$var1,
ifelse (!is.na(df$var[+1]),df$var[-1]+(df$var1[-1]+df$var1[+1])/2,NA))
i'm open to any form of solution I am a beginner so a detailed answer would be great! 我愿意接受任何形式的解决方案,我是初学者,所以详细的答案将是非常好的! Thank you
谢谢
Eve 前夕
Another approach is to build a linear model using the existing data you have and then use that model (model predictions) to replace NAs. 另一种方法是使用现有数据构建线性模型,然后使用该模型(模型预测)替换NA。
A simple example to help you understand is this: 一个可以帮助您理解的简单示例是:
library(ggplot2)
# create example dataset
df = data.frame(value = mtcars$qsec,
time = 1:nrow(mtcars))
# replace some values with NA (you can experiment with different values)
df$value[c(5,12,17,18,30)] = NA
# build linear model based on existing data (model ignores rows with NAs)
m = lm(value ~ time, data = df)
# add predictions as a column
df$pred_value = predict(m, newdata = df)
# replace (only) NAs with predictions
df$interp_value = ifelse(is.na(df$value), df$pred_value, df$value)
# plot existing and interpolated data
ggplot()+
geom_point(data=df, aes(time, value), size=5)+
geom_point(data=df, aes(time, interp_value), col="red")
Where the black points represent the existing values and the red points represent existing + NA replacements. 黑点表示现有值,红点表示现有+ NA替换值。
The easiest way solve this is to use a package that has functions for missing data replacement like imputeTS
or forecast
, zoo
解决此问题的最简单方法是使用一个具有丢失数据替换功能的程序包,例如
imputeTS
或forecast
, zoo
The process of replacing missing values with reasonable estimations is also called 'imputation' in statistics. 用合理的估计替换缺失值的过程在统计中也称为“输入”。
For interpolating a time series, vector or data.frame it is as easy as this: 为了插值时间序列,向量或数据帧,它很容易:
library("imputeTS")
na.interpolation(yourDataWithNAs)
Keep in mind, there are also other imputation methods beyond linear interpolation. 请记住,除了线性插值以外,还有其他插补方法。 Eg Moving Average Imputation, Seasonality based imputation - depending on the problem another method will provide better results.
例如,移动平均插补,基于季节性的插补-根据问题,另一种方法将提供更好的结果。 (here are some further explanations: Time Series Imputation )
(以下是一些进一步的解释: 时间序列归因 )
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.