簡體   English   中英

潤滑; Dplyr 如何按周和類別聚合 dataframe

[英]Lubridate; Dplyr how to aggregate a dataframe by week and category

考慮以下示例

library(dplyr)
library(lubridate)

time <- seq(from =ymd("2014-01-01"),to= ymd("2014-02-20"), by="days")
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
tipe <- sample(rep(x = c("Tipe_A", "Tipe_B", "Tipe_C")), size = length(time), replace = TRUE)

df2 <- data_frame(time, tipe, values)

# A tibble: 51 x 3
   time       tipe   values
   <date>     <chr>   <dbl>
 1 2014-01-01 Tipe_B     40
 2 2014-01-02 Tipe_B     30
 3 2014-01-03 Tipe_A     35
 4 2014-01-04 Tipe_A     50
 5 2014-01-05 Tipe_B     35
 6 2014-01-06 Tipe_B     50
 7 2014-01-07 Tipe_A     50
 8 2014-01-08 Tipe_B     40
 9 2014-01-09 Tipe_A     30
10 2014-01-10 Tipe_B     25
# ... with 41 more rows

我想計算值之間的差異,並按周和小費匯總這個 dataframe。

我只能按類型分隔

df2 %>%
  filter(tipe == "Tipe_A") %>%
  mutate(diff = values - lag(values, order_by = time)) %>%
  group_by(week = week(time)) %>%
  summarise(avr = mean(diff, na.rm = T))

# A tibble: 7 x 2
   week    avr
  <dbl>  <dbl>
1     1   7.5 
2     2 -20   
3     3   3.33
4     5   0   
5     6  -3.33
6     7 -10   
7     8  25

但是我有很多類型,所以這將是一個乏味的過程。

有沒有辦法讓每種類型的效率更高?

在這里,我們可能需要先按 'tipe' 進行分組,然后計算 'diff',將 'week' 也添加為分組列,然后才能得到summarise中的mean

library(dplyr)
df2 %>%
   group_by(tipe) %>% 
   mutate(diff = values - lag(values, order_by = time)) %>%
   group_by(week = week(time), .add = TRUE) %>%
   summarise(avr = mean(diff, na.rm = TRUE))

或者先arrange

df2 %>%
   arrange(tipe, time) %>% 
   group_by(tipe) %>% 
   mutate(diff = values - lag(values)) %>%
   group_by(week = week(time), .add = TRUE) %>%
   summarise(avr = mean(diff, na.rm = TRUE))

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM