簡體   English   中英

總結后如何考慮組內較大的日期

[英]How to consider the bigger date inside groups after summarize

我通過分組取平均值,3乘3。 為此,我使用summarise function。 在這種情況下,我想 select 從四個構成平均值的最后一個日期。

我嘗試將 select 設為最大值,但這樣我只是選擇了整個組的最高日期。

test = data.frame(my_groups = c("A", "A", "A", "B", "B", "C", "C", "C",  "A", "A", "A"),
                  measure = c(10, 20, 5, 2, 62 ,2, 5, 4, 6, 7, 25),
                  time= c("20-09-2020", "25-09-2020", "19-09-2020", "20-05-2020", "20-06-2021", 
                                      "11-01-2021", "13-01-2021", "13-01-2021", "15-01-2021", "15-01-2021", "19-01-2021"))
# > test
#    my_groups measure       time
# 1          A      10 20-09-2020
# 2          A      20 25-09-2020
# 3          A       5 19-09-2020
# 4          B       2 20-05-2020
# 5          B      62 20-06-2021
# 6          C       2 11-01-2021
# 7          C       5 13-01-2021
# 8          C       4 13-01-2021
# 9          A       6 15-01-2021
# 10         A       7 15-01-2021
# 11         A      25 19-01-2021

test %>%
  arrange(time) %>%
  group_by(my_groups) %>%
  summarise(mean_3 = rollapply(measure, 3, mean, by = 3, align = "left", partial = F),
            final_data = max(time))

# my_groups mean_3 final_data
#   <chr>       <dbl> <chr>     
# 1 A           12.7  25-09-2020
# 2 A           11.7  25-09-2020
# 3 C           3.67 13-01-2021

在第二行中,我希望日期是19-01-2021 ,而不是A組的全球最大值( 25-09-2020 )。

關於我如何做到這一點的任何提示?

我有 2 個 dplyr 方式供您使用。 對此不滿意,因為當使用max和 dates 的rollapply在 B 組中找不到任何內容時,它默認使用與 A 組和 C 中的字符不匹配的 double。

變異:

test %>%
  arrange(time) %>%
  group_by(my_groups) %>% 
  mutate(final = rollapply(time, 3, max, by = 3, fill = NA, align = "left", partial = F),
         mean_3 = rollapply(measure, 3, mean, by = 3, fill = NA, align = "left", partial = F)) %>% 
  filter(!is.na(final)) %>% 
  select(my_groups, final, mean_3) %>% 
  arrange(my_groups)

# A tibble: 3 x 3
# Groups:   my_groups [2]
  my_groups final      mean_3
  <chr>     <chr>       <dbl>
1 A         19-01-2021  12.7 
2 A         25-09-2020  11.7 
3 C         13-01-2021   3.67

Summarize 沒有總結,但代碼更簡潔:

test %>%
  arrange(time) %>%
  group_by(my_groups) %>% 
  summarise(final = rollapply(time, 3, max, by = 3, fill = NA, align = "left", partial = F),
         mean_3 = rollapply(measure, 3, mean, by = 3, fill = NA, align = "left", partial = F)) %>% 
  filter(!is.na(final))

`summarise()` has grouped output by 'my_groups'. You can override using the `.groups` argument.
# A tibble: 3 x 3
# Groups:   my_groups [2]
  my_groups final      mean_3
  <chr>     <chr>       <dbl>
1 A         19-01-2021  12.7 
2 A         25-09-2020  11.7 
3 C         13-01-2021   3.67

我很驚訝rollapply with max沒有按預期工作。 也許有人會對此有答案或更好的解決方案。

另一種可能的解決方案:

library(tidyverse)

test = data.frame(my_groups = c("A", "A", "A", "B", "B", "C", "C", "C",  "A", "A", "A"),
                  measure = c(10, 20, 5, 2, 62 ,2, 5, 4, 6, 7, 25),
                  time= c("20-09-2020", "25-09-2020", "19-09-2020", "20-05-2020", "20-06-2021", 
                          "11-01-2021", "13-01-2021", "13-01-2021", "15-01-2021", "15-01-2021", "19-01-2021"))

test %>% 
  group_by(data.table::rleid(my_groups)) %>% 
  filter(n() == 3) %>% 
  summarise(
    groups = unique(my_groups), 
    mean_3 = mean(measure), final_data = max(time), .groups = "drop") %>%
  select(-1)

#> # A tibble: 3 × 3
#>   groups mean_3 final_data
#>   <chr>   <dbl> <chr>     
#> 1 A       11.7  25-09-2020
#> 2 C        3.67 13-01-2021
#> 3 A       12.7  19-01-2021

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM