[英]Adding missing rows to time series data in r
我有一個 GPS 坐標數據集,但有時會丟失。 我想用適當的時間戳和 NA 為 lat 和 lon 列添加缺失的行
這是當前數據的格式:
timestamp lon lat id date time
<time> <dbl> <dbl> <int> <date> <chr>
1 2012-08-01 03:59:00 36.92288 0.3508941 1 2012-08-01 03:59:00
2 2012-08-01 03:59:01 36.92288 0.3508901 1 2012-08-01 03:59:01
3 2012-08-01 03:59:02 36.92288 0.3508868 1 2012-08-01 03:59:02
4 2012-08-01 03:59:03 36.92288 0.3508828 1 2012-08-01 03:59:03
5 2012-08-01 03:59:05 36.92288 0.3508845 1 2012-08-01 03:59:05
6 2012-08-01 03:59:06 36.92288 0.3508866 1 2012-08-01 03:59:06
7 2012-08-01 03:59:07 36.92288 0.3508885 1 2012-08-01 03:59:07
8 2012-08-01 03:59:08 36.92288 0.3508903 1 2012-08-01 03:59:08
9 2012-08-01 03:59:09 36.92288 0.3508915 1 2012-08-01 03:59:09
正如你所看到的,時間戳 2012-08-01 03:59:04 丟失了,我希望最后有這樣的東西
timestamp lon lat id date time
<time> <dbl> <dbl> <int> <date> <chr>
1 2012-08-01 03:59:00 36.92288 0.3508941 1 2012-08-01 03:59:00
2 2012-08-01 03:59:01 36.92288 0.3508901 1 2012-08-01 03:59:01
3 2012-08-01 03:59:02 36.92288 0.3508868 1 2012-08-01 03:59:02
4 2012-08-01 03:59:03 36.92288 0.3508828 1 2012-08-01 03:59:03
5 2012-08-01 03:59:04 NA NA 1 2012-08-01 03:59:04
6 2012-08-01 03:59:05 36.92288 0.3508845 1 2012-08-01 03:59:05
7 2012-08-01 03:59:06 36.92288 0.3508866 1 2012-08-01 03:59:06
8 2012-08-01 03:59:07 36.92288 0.3508885 1 2012-08-01 03:59:07
9 2012-08-01 03:59:08 36.92288 0.3508903 1 2012-08-01 03:59:08
10 2012-08-01 03:59:09 36.92288 0.3508915 1 2012-08-01 03:59:09
對此的任何幫助將不勝感激!
使用rbind.fill
從plyr
包:
ts_data <- read.table(text="timestamp,lon,lat,id,date,time
2012-08-01 03:59:00,36.92288,0.3508941,1,2012-08-01,03:59:00
2012-08-01 03:59:01,36.92288,0.3508901,1,2012-08-01,03:59:01
2012-08-01 03:59:02,36.92288,0.3508868,1,2012-08-01,03:59:02
2012-08-01 03:59:03,36.92288,0.3508828,1,2012-08-01,03:59:03
2012-08-01 03:59:05,36.92288,0.3508845,1,2012-08-01,03:59:05
2012-08-01 03:59:06,36.92288,0.3508866,1,2012-08-01,03:59:06
2012-08-01 03:59:07,36.92288,0.3508885,1,2012-08-01,03:59:07
2012-08-01 03:59:08,36.92288,0.3508903,1,2012-08-01,03:59:08
2012-08-01 03:59:09,36.92288,0.3508915,1,2012-08-01,03:59:09",
header=T, stringsAsFactors=F, sep=",")
##Convert timestamp to datetime
ts_data$timestamp <- ts_data$timestamp <- as.POSIXct(strftime(ts_data$timestamp))
##Get full sequence
full_sequence <- seq(from=min(ts_data$timestamp),
to=max(ts_data$timestamp), by="s")
##Grab the missing sequence
missing_sequence <- full_sequence[!(full_sequence %in% ts_data$timestamp)]
##Make a data.frame out of the missing sequence
missing_df <- data.frame(timestamp = missing_sequence,
id = rep(1,length(missing_sequence)),
date = strftime(missing_sequence, format = "%Y-%m-%d"),
time=strftime(missing_sequence, format = "%H:%M:%S"))
##Combine the two
new_ts_data <- plyr::rbind.fill(ts_data, missing_df)
##Order by timestamp
new_ts_data <- new_ts_data[order(new_ts_data$timestamp),]
期望的輸出:
timestamp lon lat id date time
2012-08-01 03:59:00 36.92288 0.3508941 1 2012-08-01 03:59:00
2012-08-01 03:59:01 36.92288 0.3508901 1 2012-08-01 03:59:01
2012-08-01 03:59:02 36.92288 0.3508868 1 2012-08-01 03:59:02
2012-08-01 03:59:03 36.92288 0.3508828 1 2012-08-01 03:59:03
2012-08-01 03:59:04 NA NA 1 2012-08-01 03:59:04
2012-08-01 03:59:05 36.92288 0.3508845 1 2012-08-01 03:59:05
2012-08-01 03:59:06 36.92288 0.3508866 1 2012-08-01 03:59:06
2012-08-01 03:59:07 36.92288 0.3508885 1 2012-08-01 03:59:07
2012-08-01 03:59:08 36.92288 0.3508903 1 2012-08-01 03:59:08
2012-08-01 03:59:09 36.92288 0.3508915 1 2012-08-01 03:59:09
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.