[英]Replace values in dataframe by matching dates of different lengths
我有52個時間序列文件,它們的日期長度不同。 全部具有相同的結束日期-2017年1月31日,但是所有52個數據框都有不同的開始日期。
'數據':nRows
Date FLOW Modelled
01-01-1992 1.856 NA
02-01-1992 1.523 NA
03-01-1992 2.623 NA
04-01-1992 3.679 NA
...
31-12-2017
對於列中的每個數據集,我還有一個模擬FLOW值的文件。
“模擬”:20819行,53列(包括日期)。
Date 1 2 3 ..52
01-01-1961 1.856 2.889 2.365
02-01-1961 1.523 3.536 4.624
03-01-1961 2.536 2.452 6.352
04-01-1961 3.486 4.267 3.685
...
31-12-2017
我的問題是我想從模擬數據中選擇每一列(例如,列1對應於“數據”文件1),並用模擬值填充“數據”的建模列。 理想情況下,這將根據文件名列表循環遍歷52個文件
我面臨的問題是使用left_join時出現的錯誤是
e.g. replacement has 20819 rows, data has 9657
當“數據”比“模擬”短時,以及
e.g. replacement has 20819 rows, data has 22821
當“數據”長於“模擬”時。
我嘗試使用dplyr包的dplyr
並沒有運氣,因為日期在'data'和'Simulated'數據幀之間不匹配。
library(dplyr)
df <-left_join(data, Simulated, by = c("Date"),all.x=TRUE)
我已經使用類似於Simulated$Date <- as.Date(with(Simulated, paste(Year, Month, Day, sep="-")), "%Y-%m-%d")
。 但是使用left_join時仍然出現以下錯誤:
cannot join a Date object with an object that is not a Date object
可以使用tidyverse
和read.table
實現解決方案。 首先從列表中的所有文件中讀取所有數據幀,然后使用dplyr::bind_rows
將它們合並到一個數據幀中。
#Get the file list
filelist = list.files(path = ".", pattern = ".*.txt", full.names = TRUE)
# Read all files in a list
ll <- lapply(filelist, FUN=read.table, header=TRUE, stringsAsFactors = FALSE)
# Read data from file containing simulate data
simulated <- read.table(file = "simulated.txt", header=TRUE, stringsAsFactors = FALSE)
library(tidyverse)
#Convert simulated data to long format and then join with other dataframes
simulated %>% mutate(Date = as.Date(Date, format = "%d-%m-%Y")) %>%
gather(df_num, SIM_FLOW, -Date) %>%
mutate(df_num = gsub("X(\\d+)", "\\1", df_num)) %>%
right_join(bind_rows(ll, .id="df_num") %>% mutate(Date = as.Date(Date, format = "%d-%m-%Y")),
by=c("df_num", "Date"))
# Date df_num SIM_FLOW FLOW Modelled
# 1 1992-01-01 1 1.86 1.86 NA
# 2 1992-01-02 1 NA 1.52 NA
# 3 1992-01-03 1 NA 2.62 NA
# 4 1992-01-04 1 NA 3.68 NA
# 5 1993-01-01 2 NA 11.86 NA
# 6 1993-01-02 2 3.54 11.52 NA
# 7 1993-01-03 2 NA 12.62 NA
# 8 1993-01-04 2 NA 13.68 NA
# 9 1994-01-01 3 NA 111.86 NA
# 10 1994-01-02 3 NA 111.52 NA
# 11 1994-01-03 3 6.35 112.62 NA
# 12 1994-01-04 3 NA 113.68 NA
數據:
simulated.txt
Date 1 2 3
01-01-1992 1.856 2.889 2.365
02-01-1993 1.523 3.536 4.624
03-01-1994 2.536 2.452 6.352
04-01-1902 3.486 4.267 3.685
File1.txt
Date FLOW Modelled
01-01-1992 1.856 NA
02-01-1992 1.523 NA
03-01-1992 2.623 NA
04-01-1992 3.679 NA
File2.txt
Date FLOW Modelled
01-01-1993 11.856 NA
02-01-1993 11.523 NA
03-01-1993 12.623 NA
04-01-1993 13.679 NA
File3.txt
Date FLOW Modelled
01-01-1994 111.856 NA
02-01-1994 111.523 NA
03-01-1994 112.623 NA
04-01-1994 113.679 NA
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.