[英]Lubridate; Dplyr how to aggregate a dataframe by week and category
考慮以下示例
library(dplyr)
library(lubridate)
time <- seq(from =ymd("2014-01-01"),to= ymd("2014-02-20"), by="days")
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
tipe <- sample(rep(x = c("Tipe_A", "Tipe_B", "Tipe_C")), size = length(time), replace = TRUE)
df2 <- data_frame(time, tipe, values)
# A tibble: 51 x 3
time tipe values
<date> <chr> <dbl>
1 2014-01-01 Tipe_B 40
2 2014-01-02 Tipe_B 30
3 2014-01-03 Tipe_A 35
4 2014-01-04 Tipe_A 50
5 2014-01-05 Tipe_B 35
6 2014-01-06 Tipe_B 50
7 2014-01-07 Tipe_A 50
8 2014-01-08 Tipe_B 40
9 2014-01-09 Tipe_A 30
10 2014-01-10 Tipe_B 25
# ... with 41 more rows
我想計算值之間的差異,並按周和小費匯總這個 dataframe。
我只能按類型分隔
df2 %>%
filter(tipe == "Tipe_A") %>%
mutate(diff = values - lag(values, order_by = time)) %>%
group_by(week = week(time)) %>%
summarise(avr = mean(diff, na.rm = T))
# A tibble: 7 x 2
week avr
<dbl> <dbl>
1 1 7.5
2 2 -20
3 3 3.33
4 5 0
5 6 -3.33
6 7 -10
7 8 25
但是我有很多類型,所以這將是一個乏味的過程。
有沒有辦法讓每種類型的效率更高?
在這里,我們可能需要先按 'tipe' 進行分組,然后計算 'diff',將 'week' 也添加為分組列,然后才能得到summarise
中的mean
library(dplyr)
df2 %>%
group_by(tipe) %>%
mutate(diff = values - lag(values, order_by = time)) %>%
group_by(week = week(time), .add = TRUE) %>%
summarise(avr = mean(diff, na.rm = TRUE))
或者先arrange
df2 %>%
arrange(tipe, time) %>%
group_by(tipe) %>%
mutate(diff = values - lag(values)) %>%
group_by(week = week(time), .add = TRUE) %>%
summarise(avr = mean(diff, na.rm = TRUE))
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.