如何在匯總 dplyr 中進行子集化

Question

我想在summarise()內進行子集化。 以下subset() -ing 是否有可能？

df <- structure(list(category = structure(c(1L, 1L, 1L, 2L, 2L, 1L, 
                                        1L, 1L, 2L, 1L, 1L, 1L, 1L, 2L), .Label = c("category MB", "category LR"
                                        ), class = "factor"), start = c(111, 222, 333, 444, 555, 111, 
                                                                        222, 333, 444, 111, 111, 222, 333, 444), stop = c(666, 777, 888, 
                                                                                                                          999, 1000, 666, 777, 888, 999, 666, 666, 777, 888, 999), ID = c(101, 
                                                                                                                                                                                          101, 101, 101, 101, 102, 102, 102, 102, 102, 102, 102, 102, 102
                                                                                                                          )), row.names = c(NA, -14L), class = "data.frame")

library(dplyr)

df %>% 
group_by(ID) %>% 
summarise(
    countAll = n(),
    durationAll = sum(stop - start),
    countCategoryMB = sum(category == "category MB"),
    durationCategoryMB = sum( subset(., category == "category MB", select = stop) -  subset(., category == "category MB", select = start) ), # line in question, currently wrong
    countCategoryLR = sum(category == "category LR"),
    durationCategoryLR = sum( subset(., category == "category LR", select = stop) -  subset(., category == "category LR", select = start) ) # line in question, currently wrong
)

預期的結果（帖子末尾的圖片），我可以使用left_join()實現。 但我希望有可能通過類似上面的代碼在一次調用中實現所需的輸出。

# expected result achieved with left_join()
 df %>%
  group_by(ID) %>%
  summarise(countAll = n(),
            durationALL = sum(stop - start)) %>%
  left_join(
            .,
            df %>%
            filter(category == "category MB") %>%
            group_by(ID) %>%
            summarise(
            countCategoryMB = n(),
            durationCategoryMB = sum(stop - start)
           ),
           by = "ID"
 ) %>%
 left_join(
           .,
           df %>%
           filter(category == "category LR") %>%
           group_by(ID) %>%
           summarise(
           countCategoryLR = n(),
           durationCategoryLR = sum(stop - start)
           ) ,
          by = "ID"
          )

感謝您的時間！

Answer 1

在下面的解決方案中， (category == "category MB")如果為 True 則等於 1，否則為 0。因此，這有效地僅對類別等於“category MB”或“category”的那些行的 start 和 stop 的值求和LR”，根據要求。

df %>% 
  group_by(ID) %>% 
  summarise(
    countAll = n(),
    durationAll = sum(stop - start),
    countCategoryMB = sum(category == "category MB"),
    durationCategoryMB = sum( ((category == "category MB")*stop) - ((category == "category MB")*start) ),
    countCategoryLR = sum(category == "category LR"),
    durationCategoryLR = sum( ((category == "category LR")*stop) - ((category == "category LR")*start) )
)

如何在匯總 dplyr 中進行子集化

問題描述

1 個解決方案

解決方案1
1 已采納 2020-03-17 12:24:23

如何在匯總 dplyr 中進行子集化

問題描述

1 個解決方案

解決方案1 1 已采納 2020-03-17 12:24:23

解決方案1
1 已采納 2020-03-17 12:24:23