[英]R summarize_if more complex functions to multiple columns
嗨,我正在尋找一個優雅的解決方案,理想地結合 dplyr&purr。
我將按 id 對數據進行分組,然后我有 4 個數字列,我想應用於所有這些列的 sum 函數。 另外,我想根據 diff 列應用於這 4 個數字列的條件總和。
所以讓'說具體我想從所有 4 列的總和,所有 4 列的總和,其中 diff<11,所有列的總和,其中 diff<21
我從這個開始,但被卡住了
df%>%group_by(id)%>%summarise_if(is.numeric,.funs = (sum,~sum(.[diff<11]))
df<-structure(list(id = c(10274565, 10274449, 10274449, 10274449,
10274565, 10274557, 10274557, 10274449, 10274565, 10274565, 10274565,
10274557, 10274565, 10274557, 10274557, 10274557, 10274557, 10274557,
10274557, 10274449, 10274449), d_amt = c(70L, 52L, 47L, 31L,
100L, 17L, 74L, 54L, 83L, 90L, 76L, 98L, 73L, 49L, 81L, 82L,
80L, 24L, 30L, 21L, 43L), d_cnt = c(3L, 4L, 3L, 3L, 5L, 3L, 1L,
3L, 1L, 3L, 3L, 3L, 5L, 1L, 4L, 1L, 1L, 5L, 4L, 4L, 5L), w_amt = c(74L,
16L, 20L, 73L, 22L, 11L, 61L, 90L, 78L, 94L, 64L, 58L, 84L, 15L,
42L, 31L, 53L, 92L, 76L, 14L, 65L), w_cnt = c(4L, 5L, 1L, 1L,
5L, 2L, 4L, 3L, 3L, 2L, 5L, 1L, 4L, 1L, 4L, 4L, 1L, 1L, 4L, 3L,
1L), diff = structure(c(43, 30, 20, 16, 22, 57, 50, 40, 64, 51,
50, 8, 88, 85, 79, 43, 28, 22, 17, 13, 3), class = "difftime", units = "days")), row.names = c(NA,
-21L), class = c("tbl_df", "tbl", "data.frame"))
你可以試試:
library(dplyr)
df %>%
group_by(id) %>%
summarise(
across(
where(is.integer),
list(below11 = ~ sum(.[diff < 11]),
below21 = ~ sum(.[diff < 21]))
)
)
輸出:
# A tibble: 3 x 9
id d_amt_below11 d_amt_below21 d_cnt_below11 d_cnt_below21 w_amt_below11 w_amt_below21 w_cnt_below11 w_cnt_below21
<dbl> <int> <int> <int> <int> <int> <int> <int> <int>
1 10274449 43 142 5 15 65 172 1 6
2 10274557 98 128 3 7 58 134 1 5
3 10274565 0 0 0 0 0 0 0 0
請注意,您的示例中的列似乎是integer
,因此是is.integer
部分。
您可以使用 :
library(dplyr)
df %>%
group_by(id) %>%
summarise(across(where(is.numeric), ~sum(.[diff < 11])))
# id d_amt d_cnt w_amt w_cnt
# <dbl> <int> <int> <int> <int>
#1 10274449 43 5 65 1
#2 10274557 98 3 58 1
#3 10274565 0 0 0 0
或者,如果您使用的是舊版本的dplyr
:
df %>%
group_by(id) %>%
summarise_if(is.numeric, ~sum(.[diff < 11]))
您可以按diff
列分組,然后對組進行匯總:
library(dplyr)
df %>%
mutate(dif_cat = case_when(diff < 11 ~ "<11",
diff < 22 ~ "<22",
TRUE ~ ">=22")) %>%
group_by(id, dif_cat) %>%
summarise(across(where(is.numeric), ~sum(.)))
# A tibble: 7 x 6
# Groups: id [3]
id dif_cat d_amt d_cnt w_amt w_cnt
<dbl> <chr> <int> <int> <int> <int>
1 10274449 <11 43 5 65 1
2 10274449 <22 99 10 107 5
3 10274449 >=22 106 7 106 8
4 10274557 <11 98 3 58 1
5 10274557 <22 30 4 76 4
6 10274557 >=22 407 16 305 17
7 10274565 >=22 492 20 416 23
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.