[英]Use list names inside purrr:::map_dfr function
I was trying something relatively simple, but having some struggles.我正在尝试一些相对简单的事情,但遇到了一些困难。 Let's say I have two dataframes
df1
and df2
:假设我有两个数据
df1
和df2
:
df1: df1:
id expenditure
1 10
2 20
1 30
2 50
df2: df2:
id expenditure
1 30
2 50
1 60
2 10
I also added them to a named list:我还将它们添加到命名列表中:
table_list = list()
table_list[['a']] = df1
table_list[['b']] = df2
And now I want to perform some summary operation through a function and then bind those rows:现在我想通过一个函数执行一些汇总操作,然后绑定这些行:
get_summary = function(table){
final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))
}
And then apply this through map_dfr
:然后通过
map_dfr
应用它:
summary = table_list %>% map_dfr(get_summary, id='origin_table')
So, this will create a almost what I'm looking for:所以,这将创建一个几乎我正在寻找的东西:
origin_table id total_expenditure
a 1 40
a 2 70
b 1 90
b 2 60
But, what if I would like to do something specific depending on the element of the list that is being passed, something like this:但是,如果我想根据正在传递的列表的元素做一些特定的事情,像这样:
get_summary = function(table, name){
dummy_list = c(TRUE, FALSE)
names(dummy_list) = c('a', 'b')
final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))
is_true = dummy_list[[name]] # Want to use the original name to call another list
if(is_true) final_table = final_table %>% mutate(total_expenditure = total_expenditure + 1)
return(final_table)
}
This would bring something like this:这会带来这样的事情:
origin_table id total_expenditure
a 1 41
a 2 71
b 1 90
b 2 60
So is there any way to use list names inside my function?那么有什么方法可以在我的函数中使用列表名称吗? Or any way to identify which element of my list I'm working with?
或者有什么方法可以识别我正在使用的列表中的哪个元素? Maybe
map_dfr
is too restricted and I have to use something else?也许
map_dfr
太受限制了,我必须使用其他东西?
Edit: changed example so it is more grounded in reality编辑:更改示例,使其更符合现实
Instead of using map
, use imap
, which can return the names of the list in .y
不要使用
map
,而是使用imap
,它可以在.y
中返回列表的名称
library(purrr)
library(dplyr)
get_summary = function(dat, name){
dat %>%
group_by(id) %>%
summarise(total_expenditure= sum(expenditure, na.rm = TRUE),
.groups = "drop") %>%
mutate(total_expenditure = if(name=='a')
total_expenditure + 1 else total_expenditure)
}
-testing -测试
> table_list %>%
imap_dfr(~ get_summary(.x, name = .y), .id = 'origin_table')
# A tibble: 4 × 3
origin_table id total_expenditure
<chr> <int> <dbl>
1 a 1 41
2 a 2 71
3 b 1 90
4 b 2 60
table_list <- list(a = structure(list(id = c(1L, 2L, 1L, 2L),
expenditure = c(10L,
20L, 30L, 50L)), class = "data.frame", row.names = c(NA, -4L)),
b = structure(list(id = c(1L, 2L, 1L, 2L), expenditure = c(30L,
50L, 60L, 10L)), class = "data.frame", row.names = c(NA,
-4L)))
Managed to do it, by adding origin_table
as a pre-existing column on the dataframes:设法做到这一点,通过添加
origin_table
作为数据框上的预先存在的列:
df1 = df1 %>% mutate(origin_table = 'a')
df2 = df2 %>% mutate(origin_table = 'b')
Then I can extract the origin by doing the following:然后我可以通过执行以下操作来提取原点:
get_summary = function(table){
dummy_list = c(TRUE, FALSE)
names(dummy_list) = c('a', 'b')
origin = table %>% distinct(origin_table) %>% pull
final_table = table %>% group_by(id) %>% summarise(total_expenditure= sum(expenditure))
is_true = dummy_list[[origin ]] # Want to use the original name to call another list
if(is_true) final_table = final_table %>% mutate(total_expenditure = total_expenditure + 1)
return(final_table)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.