簡體   English   中英

R summarise_if 更復雜的函數到多列

[英]R summarize_if more complex functions to multiple columns

嗨,我正在尋找一個優雅的解決方案,理想地結合 dplyr&purr。

我將按 id 對數據進行分組,然后我有 4 個數字列,我想應用於所有這些列的 sum 函數。 另外,我想根據 diff 列應用於這 4 個數字列的條件總和。

所以讓'說具體我想從所有 4 列的總和,所有 4 列的總和,其中 diff<11,所有列的總和,其中 diff<21

我從這個開始,但被卡住了

df%>%group_by(id)%>%summarise_if(is.numeric,.funs = (sum,~sum(.[diff<11])) 


    df<-structure(list(id = c(10274565, 10274449, 10274449, 10274449, 
10274565, 10274557, 10274557, 10274449, 10274565, 10274565, 10274565, 
10274557, 10274565, 10274557, 10274557, 10274557, 10274557, 10274557, 
10274557, 10274449, 10274449), d_amt = c(70L, 52L, 47L, 31L, 
100L, 17L, 74L, 54L, 83L, 90L, 76L, 98L, 73L, 49L, 81L, 82L, 
80L, 24L, 30L, 21L, 43L), d_cnt = c(3L, 4L, 3L, 3L, 5L, 3L, 1L, 
3L, 1L, 3L, 3L, 3L, 5L, 1L, 4L, 1L, 1L, 5L, 4L, 4L, 5L), w_amt = c(74L, 
16L, 20L, 73L, 22L, 11L, 61L, 90L, 78L, 94L, 64L, 58L, 84L, 15L, 
42L, 31L, 53L, 92L, 76L, 14L, 65L), w_cnt = c(4L, 5L, 1L, 1L, 
5L, 2L, 4L, 3L, 3L, 2L, 5L, 1L, 4L, 1L, 4L, 4L, 1L, 1L, 4L, 3L, 
1L), diff = structure(c(43, 30, 20, 16, 22, 57, 50, 40, 64, 51, 
50, 8, 88, 85, 79, 43, 28, 22, 17, 13, 3), class = "difftime", units = "days")), row.names = c(NA, 
-21L), class = c("tbl_df", "tbl", "data.frame"))

你可以試試:

library(dplyr)

df %>%
  group_by(id) %>%
  summarise(
    across(
      where(is.integer),
      list(below11 = ~ sum(.[diff < 11]),
           below21 = ~ sum(.[diff < 21]))
    )
  )

輸出:

# A tibble: 3 x 9
        id d_amt_below11 d_amt_below21 d_cnt_below11 d_cnt_below21 w_amt_below11 w_amt_below21 w_cnt_below11 w_cnt_below21
     <dbl>         <int>         <int>         <int>         <int>         <int>         <int>         <int>         <int>
1 10274449            43           142             5            15            65           172             1             6
2 10274557            98           128             3             7            58           134             1             5
3 10274565             0             0             0             0             0             0             0             0

請注意,您的示例中的列似乎是integer ,因此是is.integer部分。

您可以使用 :

library(dplyr)
df %>%
  group_by(id) %>%
  summarise(across(where(is.numeric), ~sum(.[diff < 11])))

#        id d_amt d_cnt w_amt w_cnt
#     <dbl> <int> <int> <int> <int>
#1 10274449    43     5    65     1
#2 10274557    98     3    58     1
#3 10274565     0     0     0     0

或者,如果您使用的是舊版本的dplyr

df %>%
  group_by(id) %>%
  summarise_if(is.numeric, ~sum(.[diff < 11]))

您可以按diff列分組,然后對組進行匯總:

library(dplyr)
df %>% 
  mutate(dif_cat = case_when(diff < 11 ~ "<11",
                             diff < 22 ~ "<22",
                             TRUE ~ ">=22")) %>% 
  group_by(id, dif_cat) %>% 
  summarise(across(where(is.numeric), ~sum(.)))


# A tibble: 7 x 6
# Groups:   id [3]
        id dif_cat d_amt d_cnt w_amt w_cnt
     <dbl> <chr>   <int> <int> <int> <int>
1 10274449 <11        43     5    65     1
2 10274449 <22        99    10   107     5
3 10274449 >=22      106     7   106     8
4 10274557 <11        98     3    58     1
5 10274557 <22        30     4    76     4
6 10274557 >=22      407    16   305    17
7 10274565 >=22      492    20   416    23

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM