[英]Add rows to list of dataframes from another dataframe
Let's have a list lis
让我们有一个列表
lis
chicago = data.frame('city' = rep('chicago'), 'year' = c(2018,2019,2020), 'population' = c(100, 105, 110))
paris = data.frame('city' = rep('paris'), 'year' = c(2018,2019,2020), 'population' = c(200, 205, 210))
berlin = data.frame('city' = rep('berlin'), 'year' = c(2018,2019,2020), 'population' = c(300, 305, 310))
bangalore = data.frame('city' = rep('bangalore'), 'year' = c(2018,2019,2020), 'population' = c(400, 405, 410))
lis = list(chicago = chicago, paris = paris, berlin = berlin, bangalore = bangalore)
Now I have a new df
containing latest data for each city
,现在我有一个新的
df
包含每个city
最新数据,
df = data.frame('city' = c('chicago', 'paris', 'berlin', 'bangalore'), 'year' = rep(2021), 'population' = c(115, 215, 315, 415))
I want to add each row of df
to lis
based on city
.我想根据
city
将df
每一行添加到lis
。
I do it by,我这样做,
#convert to datframe
lis = dplyr::bind_rows(lis)
#rbind
lis = rbind(lis, df)
#again convert to list
lis = split(lis, lis$city)
which is inefficient for large datsets.这对于大型数据集效率低下。 Is their any efficient alternate for large datsets ?
对于大型数据集,它们是否有任何有效的替代方案?
Thank you.谢谢你。
Edit编辑
Unit: seconds
expr min lq mean median uq max neval
ac() 22.43719 23.17452 27.85401 24.80335 25.62127 43.23373 5
The list contains 2239
dataframes and dimension of each dataframe is 310x15
.该列表包含
2239
数据帧,每个数据帧的尺寸为310x15
。 Each of these dataframe grow daily.这些数据帧中的每一个每天都在增长。
We may use imap
to loop over the list
, and filter
the 'df' based on the names of the list
to append the row in each of the list
elements我们可以使用
imap
循环遍历list
,并根据list
名称filter
'df' 以将行附加到每个list
元素中
library(dplyr)
library(purrr)
lis2 <- imap(lis, ~ .x %>%
bind_rows(df %>%
filter(city == .y)))
-output -输出
> lis2
$chicago
city year population
1 chicago 2018 100
2 chicago 2019 105
3 chicago 2020 110
4 chicago 2021 115
$paris
city year population
1 paris 2018 200
2 paris 2019 205
3 paris 2020 210
4 paris 2021 215
$berlin
city year population
1 berlin 2018 300
2 berlin 2019 305
3 berlin 2020 310
4 berlin 2021 315
$bangalore
city year population
1 bangalore 2018 400
2 bangalore 2019 405
3 bangalore 2020 410
4 bangalore 2021 415
Or using base R
with Map
and rbind
或者使用带有
Map
和rbind
base R
Map(function(x, nm) rbind(x, df[df$city == nm,]), lis, names(lis))
Or use rbindlist
from data.table
或者使用
rbindlist
的data.table
library(data.table)
rbindlist(c(lis, list(df)))[, .(split(.SD, city))]$V1
Or a slightly more efficient, will be with split
或者稍微更有效的,将与
split
Map(rbind, lis, split(df, df$city)[names(lis)])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.