Dplyr 總結：組合某些組的值

Question

我有每位患者入院的數據。 我正在嘗試將 5 天內重新入院的患者的護理價格相加。

這是一個示例數據集：

(
    dt <- data.frame(
        id         = c(1, 1, 2, 2, 3, 4),
        admit_date = c(1, 9, 5, 9, 10, 20),
        price      = c(10, 20, 20, 30, 15, 16)
    )
)

#   id admit_date price
# 1  1          1    10
# 2  1          9    20
# 3  2          5    20
# 4  2          9    30
# 5  3         10    15
# 6  4         20    16

這是我到目前為止所嘗試的：

library(dplyr)

# 5-day readmission:
dt %>%
    group_by(id) %>%
    arrange(id, admit_date)%>%
    mutate(
        duration = admit_date - lag(admit_date),
        readmit = ifelse(duration < 6, 1, 0)
        ) %>%
    group_by(id, readmit) %>%           # this is where i get stuck
    summarize(sumprice = sum(price)) 

# # A tibble: 6 × 3
# # Groups:   id [4]
#      id readmit sumprice
#   <dbl>   <dbl>    <dbl>
# 1     1       0       20
# 2     1      NA       10
# 3     2       1       30
# 4     2      NA       20
# 5     3      NA       15
# 6     4      NA       16

這就是我想要的：

#   id sum_price
# 1  1        10
# 2  1        20
# 3  2        50
# 4  3        15
# 5  4        16

Answer 1

如果相鄰訪問之間的天數差異大於 5 - 如果不是，則返回TRUE - 返回FALSE （ -Inf > 5第一天為FALSE ，因此lag s 默認為Inf ）。 之后，對於每個人，我們對 label 個組進行累計。 我們最終在每個個體中進行總結，使用這個cumsum作為by的分組變量：

dt |>
    group_by(id) |>
    arrange(id, admit_date) |>
    summarise(
        sum_price = by(
            price, 
            cumsum((admit_date - lag(admit_date, , Inf)) > 5), 
            sum
            )
        ) |>
    ungroup()

# # A tibble: 5 × 2
#      id sum_price
#   <dbl> <by>     
# 1     1 10       
# 2     1 20       
# 3     2 50       
# 4     3 15       
# 5     4 16

Answer 2

因此，您希望（最多）在最后的 dataframe 中每個患者一行，因此您應該只對id進行分組。

然后，對於每個患者，您應該計算該患者是否有any帶有readmit==的行）。

最后，您從匯總的 dataframe 中過濾掉所有未再次入院的患者。

把它們放在一起，它可能看起來像：

dt %>%
  group_by(id) %>%
  arrange(id, admit_date) %>%
  mutate(duration = admit_date - lag(admit_date),
         readmit = ifelse(duration < 6, 1, 0)) %>%
  group_by(id) %>%  # group by just 'id' to get one row per patient
  summarize(sumprice = sum(price, na.rm = T),
            is_readmit = any(readmit == 1)) %>%  # If patient has any 'readmit' rows, count the patient as a readmit patient
  filter(is_readmit) %>%  # Filter out any non-readmit patients
  select(-is_readmit)  # get rid of the `is_readmit` column

這應該導致：

# A tibble: 1 x 3
     id sumprice is_readmit
  <dbl>    <dbl> <lgl>     
1     2       50 TRUE

Dplyr 總結：組合某些組的值

問題描述

2 個解決方案

解決方案1
0 已采納 2021-08-23 15:54:52

解決方案2
-1 2021-08-23 14:44:11

Dplyr 總結：組合某些組的值

問題描述

2 個解決方案

解決方案1 0 已采納 2021-08-23 15:54:52

解決方案2 -1 2021-08-23 14:44:11

解決方案1
0 已采納 2021-08-23 15:54:52

解決方案2
-1 2021-08-23 14:44:11