[英]Match dates from list of data frames in R
I have a list of 100+ time series dataframes my.list
with daily observations for each product in its own data frame. 我有一个100多个时间序列数据框
my.list
的列表,其中每个产品都有自己的数据框的每日观察结果。 Some values are NA without any record of the date. 有些值为NA,没有任何日期记录。 I would like to update each data frame in this list to show the date and
NA
if it does not have a record on this date. 我想更新此列表中的每个数据框以显示日期和
NA
如果该日期没有记录)。
Dates: 日期:
start = as.Date('2016/04/08')
full <- seq(start, by='1 days', length=10)
Sample Time Series Data: 样本时间序列数据:
d1 <- data.frame(Date = seq(start, by ='2 days',length=5), Sales = c(5,10,15,20,25))
d2 <- data.frame(Date = seq(start, by= '1 day', length=10),Sales = c(1, 2, 3,4,5,6,7,8,9,10))
my.list <- list(d1, d2)
I want to merge all full
date values into each data frame, and if no match exists then sales
is NA: 我想将所有
full
日期值合并到每个数据框中,如果不存在匹配项,则sales
为NA:
my.list
[[d1]]
Date Sales
2016-04-08 5
2016-04-09 NA
2016-04-10 10
2016-04-11 NA
2016-04-12 15
2016-04-13 NA
2016-04-14 20
2016-04-15 NA
2016-04-16 25
2016-04-17 NA
[[d2]]
Date Sales
2016-04-08 1
2016-04-09 2
2016-04-10 3
2016-04-11 4
2016-04-12 5
2016-04-13 6
2016-04-14 7
2016-04-15 8
2016-04-16 9
2016-04-17 10
If I understand correctly, the OP wants to update each of the dataframes in my.list
to contain one row for each date given in the vector of dates full
如果我理解正确,那么OP希望更新
my.list
每个数据my.list
以在full
日期向量中给定的每个日期包含一行
In base R, merge()
can be used as already mentioned by Hack-R . 在基础R中,可以像Hack-R所提到的那样使用
merge()
。 However, th answer below expands this to work on all dataframes in the list: 但是,下面的答案将其扩展为适用于列表中的所有数据框:
# creat dataframe from vector of full dates
full.df <- data.frame(Date = full)
# apply merge on each dataframe in the list
lapply(my.list, merge, y = full.df, all.y = TRUE)
[[1]] Date Sales 1 2016-04-08 5 2 2016-04-09 NA 3 2016-04-10 10 4 2016-04-11 NA 5 2016-04-12 15 6 2016-04-13 NA 7 2016-04-14 20 8 2016-04-15 NA 9 2016-04-16 25 10 2016-04-17 NA [[2]] Date Sales 1 2016-04-08 1 2 2016-04-09 2 3 2016-04-10 3 4 2016-04-11 4 5 2016-04-12 5 6 2016-04-13 6 7 2016-04-14 7 8 2016-04-15 8 9 2016-04-16 9 10 2016-04-17 10
The answer assumes that full
covers the overall range of Date
of all dataframes in the list. 答案假定
full
覆盖了列表中所有数据框的Date
范围。
In order to avoid any mishaps, the overall range of Date
can be retrieved from the available data in my.list
: 为了避免任何意外,可以从
my.list
的可用数据中检索Date
的整个范围:
overall_date_range <- Reduce(range, lapply(my.list, function(x) range(x$Date)))
full <- seq(overall_date_range[1], overall_date_range[2], by = "1 days")
rbindlist()
rbindlist()
Alternatively, the list of dataframes which are identical in structure can be stored in a large dataframe. 或者,可以将结构相同的数据帧列表存储在较大的数据帧中。 An additional attribute indicates to which product each row belongs to.
附加属性指示每行属于哪个产品。 The homogeneous structure simplifies subsequent operations.
均匀的结构简化了后续操作。
The code below uses the rbindlist()
function from the data.table
package to create a large data.table
. 下面的代码使用
data.table
包中的rbindlist()
函数创建一个大的data.table
。 CJ()
( cross join ) creates all combinations of dates and product id which is then merged / joined to fill in the missing dates: CJ()
( cross join )创建日期和产品ID的所有组合,然后合并/合并以填充缺少的日期:
library(data.table)
all_products <- rbindlist(my.list, idcol = "product.id")[
CJ(product.id = unique(product.id), Date = seq(min(Date), max(Date), by = "1 day")),
on = .(Date, product.id)]
all_products
product.id Date Sales 1: 1 2016-04-08 5 2: 1 2016-04-09 NA 3: 1 2016-04-10 10 4: 1 2016-04-11 NA 5: 1 2016-04-12 15 6: 1 2016-04-13 NA 7: 1 2016-04-14 20 8: 1 2016-04-15 NA 9: 1 2016-04-16 25 10: 1 2016-04-17 NA 11: 2 2016-04-08 1 12: 2 2016-04-09 2 13: 2 2016-04-10 3 14: 2 2016-04-11 4 15: 2 2016-04-12 5 16: 2 2016-04-13 6 17: 2 2016-04-14 7 18: 2 2016-04-15 8 19: 2 2016-04-16 9 20: 2 2016-04-17 10
Subsequent operations can be grouped by product.id
, eg, to determine the number of valid sales data for each product: 后续操作可以按
product.id
进行分组,例如,以确定每种产品的有效销售数据的数量:
all_products[!is.na(Sales), .(valid.sales.data = .N), by = product.id]
product.id valid.sales.data 1: 1 5 2: 2 10
Or, the totals sales per product: 或者,每种产品的总销售额:
all_products[, .(total.sales = sum(Sales, na.rm = TRUE)), by = product.id]
product.id total.sales 1: 1 75 2: 2 55
If required for some reason the result can be converted back to a list by 如果出于某种原因需要,结果可以通过以下方式转换回列表
split(all_products, by = "product.id")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.