[英]Write function to perform conditional summarize in R using named list
我正在嘗試編寫一個函數,該函數接受一個 tibble 和一個過濾器規范列表,並根據這些過濾器規范執行條件匯總。
# Sample DF with a column to summarize and 2 ID columns.
df <- tibble(
to_summarize = c(1, 2, 8, 9),
ID1 = c('A', 'A', 'C', 'A'),
ID2 = c('X', 'Y', 'Z', 'X')
)
我們可以使用兩個 ID(返回 10)或使用 1 個 ID(返回 12)有條件地匯總。
df %>%
summarize(
total1 = sum(to_summarize[ID1 == 'A' & ID2 == 'X']),
total2 = sum(to_summarize[ID1 == 'A'])
)
我想在一個函數中允許同樣的靈活性。 用戶應該能夠提供一個過濾器列表或一個空列表(其中匯總函數將在整個列上執行,沒有過濾)。
我想最簡單的方法是使用命名列表,其中每個名稱都是一個要過濾的列,每個值都是過濾該列的值。
filters <- list(
ID1 = 'A',
ID2 = 'X'
)
# Here is my attempt at a function to implement this:
summarise_and_filter <- function(df, filters) {
df %>%
summarise(
total = sum(to_summarize[names(filters) == unname(unlist(filters))]))
}
# It does not work, it just returns zero
df %>%
summarise_and_filter(
filters = filters
)
# I imagine the function might need to call map in some way, or perhaps imap?
map_summarise_and_filter <- function(df, filters) {
df %>%
summarise(
total = sum(
to_summarize[
imap_lgl(
filters,
~.y == .x
)]
)
)
}
# But this also returns zero
df %>%
map_summarise_and_filter(
filters = filters
)
有兩個操作完成,其中一個可以動態計算
library(dplyr)
df %>%
mutate(total2 = sum(to_summarize[ID1 == filters[['ID1']]])) %>%
filter(across(starts_with("ID"), ~ . ==
filters[[cur_column()]])) %>%
summarise(total1 = sum(to_summarize),total2 = first(total2))
-輸出
# A tibble: 1 x 2
total1 total2
<dbl> <dbl>
1 10 12
如果我們想在沒有filter
情況下執行此操作,則將across
輸出reduce
到單個邏輯vector
到subset
library(purrr)
df %>%
summarise(total1 = sum(to_summarize[across(starts_with('ID'),
~ . == filters[[cur_column()]]) %>%
reduce(`&`)]),
total2 = sum(to_summarize[ID1 == filters[['ID1']]]))
-輸出
# A tibble: 1 x 2
total1 total2
<dbl> <dbl>
1 10 12
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.