[英]dplyr, lubridate : how to aggregate a dataframe by week?
請考慮以下示例
library(tidyverse)
library(lubridate)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week = wday(time, label = TRUE))
Source: local data frame [25 x 3]
time values day_of_week
<date> <dbl> <fctr>
1 2014-02-24 30 Mon
2 2014-02-25 45 Tues
3 2014-02-26 30 Wed
4 2014-02-27 50 Thurs
5 2014-02-28 50 Fri
6 2014-03-01 20 Sat
7 2014-03-02 35 Sun
8 2014-03-03 50 Mon
9 2014-03-04 35 Tues
10 2014-03-05 35 Wed
我希望按周匯總這個數據框。
也就是說,假設我將一周定義為星期一早上開始並在星期日晚上結束,我們稱之為Monday to Monday
循環。 (重要的是,我希望能夠選擇其他約定,例如周五到周五)。
然后,我只想計算每周values
的均值。
舉例來說,在上面的例子中,我們的平均計算values
周一2月24日之間到周日3月2日,依此類推。
我怎樣才能做到這一點?
謝謝!
編輯:感謝所有提出想法的人。 有點不尋常,我認為我的后期解決方案可能更合適。 再次感謝!
在tidyverse,
df2 %>% group_by(week = week(time)) %>% summarise(value = mean(values))
## # A tibble: 5 × 2
## week value
## <dbl> <dbl>
## 1 8 37.50000
## 2 9 38.57143
## 3 10 38.57143
## 4 11 36.42857
## 5 12 45.00000
或者使用isoweek
代替:
df2 %>% group_by(week = isoweek(time)) %>% summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <int> <dbl>
## 1 9 37.14286
## 2 10 40.71429
## 3 11 35.00000
## 4 12 42.50000
或者cut.Date
:
df2 %>% group_by(week = cut(time, "week")) %>% summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <fctr> <dbl>
## 1 2014-02-24 37.14286
## 2 2014-03-03 40.71429
## 3 2014-03-10 35.00000
## 4 2014-03-17 42.50000
如果您願意,可以告訴您在周日開始:
df2 %>% group_by(week = cut(time, "week", start.on.monday = FALSE)) %>%
summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <fctr> <dbl>
## 1 2014-02-23 37.50000
## 2 2014-03-02 40.00000
## 3 2014-03-09 33.57143
## 4 2014-03-16 44.00000
如果您想轉到星期二開始,請在您的日期添加一個:
df2 %>% group_by(week = cut(time + 1, "week")) %>% summarise(value = mean(values))
## # A tibble: 4 × 2
## week value
## <fctr> <dbl>
## 1 2014-02-24 37.50000
## 2 2014-03-03 40.00000
## 3 2014-03-10 33.57143
## 4 2014-03-17 44.00000
不過,標簽將會關閉。 如果使用cut
,請考慮其include.lowest
和right
參數的含義,記錄在?cut
。
為什么不直接使用floor_date
和一個整數來調整一周的開始日期?
library(lubridate)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week = weekdays(time))
# week wednesday to tuesday
df2 %>% group_by(Week = floor_date(time-3, unit="week")) %>%
summarize(WeeklyAveDist=mean(values), mean(values), min_date = min(time), max_date = max(time)) %>% mutate(weekdays(min_date), weekdays(max_date)))
Week WeeklyAveDist mean.values. min_date max_date
1 2014-02-16 37.50000 37.50000 2014-02-24 2014-02-25
2 2014-02-23 38.57143 38.57143 2014-02-26 2014-03-04
3 2014-03-02 38.57143 38.57143 2014-03-05 2014-03-11
4 2014-03-09 36.42857 36.42857 2014-03-12 2014-03-18
5 2014-03-16 45.00000 45.00000 2014-03-19 2014-03-20
weekdays.min_date. weekdays.max_date.
1 Monday Tuesday
2 Wednesday Tuesday
3 Wednesday Tuesday
4 Wednesday Tuesday
5 Wednesday Thursday
# Week Thursday to Wednesday
df2 %>% group_by(Week = floor_date(time-4, unit="week")) %>%
summarize(WeeklyAveDist=mean(values), mean(values), min_date = min(time), max_date = max(time)) %>% mutate(weekdays(min_date), weekdays(max_date)))
Week WeeklyAveDist mean.values. min_date max_date
1 2014-02-16 35.00000 35.00000 2014-02-24 2014-02-26
2 2014-02-23 39.28571 39.28571 2014-02-27 2014-03-05
3 2014-03-02 37.14286 37.14286 2014-03-06 2014-03-12
4 2014-03-09 40.00000 40.00000 2014-03-13 2014-03-19
5 2014-03-16 40.00000 40.00000 2014-03-20 2014-03-20
weekdays.min_date. weekdays.max_date.
1 Monday Wednesday
2 Thursday Wednesday
3 Thursday Wednesday
4 Thursday Wednesday
5 Thursday Thursday
aggregate(df2$values,by=list(week(df2$time)),mean)
Group.1 x 1 8 30.00000 2 9 40.00000 3 10 36.42857 4 11 37.85714 5 12 43.33333
這使用了lubridate的week
函數,並給出了一年中一周的周數。
要控制一周中哪一天是開始日,請參閱該主題的主題:
nograpes來自該線程的解決方案表明,如果你想使用一周中任意一天的week()
函數的自定義版本作為一周的開始,你只需要從基礎R構建它,如下所示:
start.of.week <- function(date) date - (setNames(c(6,0:5),0:6) [strftime(date,'%w')]) end.of.week <- function(date) date + (setNames(c(0,6:1),0:6) [strftime(date,'%w')]) start.of.week(as.Date(c('2014-01-05','2014-10-02','2014-09-22','2014-09-27'))) # "2013-12-30" "2014-09-29" "2014-09-22" "2014-09-22" end.of.week(as.Date(c('2014-01-05','2014-10-02','2014-09-22','2014-09-27'))) # "2014-01-05" "2014-10-05" "2014-09-28" "2014-09-28"
在未來, lubridate
將有一個任意開始日期的這個選項,但是Hadley還沒有添加它( https://github.com/hadley/lubridate/issues/257 )。
就這一次,經過一些研究,我實際上認為我想出了一個更好的解決方案
以下示例從周四開始的幾個星期。 這幾周將在給定周期的第一天標記。
library(tidyverse)
library(lubridate)
options(tibble.print_min = 30)
time <- seq(from =ymd("2014-02-24"),to= ymd("2014-03-20"), by="days")
set.seed(123)
values <- sample(seq(from = 20, to = 50, by = 5), size = length(time), replace = TRUE)
df2 <- data_frame(time, values)
df2 <- df2 %>% mutate(day_of_week_label = wday(time, label = TRUE),
day_of_week = wday(time, label = FALSE))
df2 <- df2 %>% mutate(thursday_cycle = time - ((as.integer(day_of_week) - 5) %% 7),
tmp_1 = (as.integer(day_of_week) - 5),
tmp_2 = ((as.integer(day_of_week) - 5) %% 7))
這使
> df2
# A tibble: 25 × 7
time values day_of_week_label day_of_week thursday_cycle tmp_1 tmp_2
<date> <dbl> <ord> <dbl> <date> <dbl> <dbl>
1 2014-02-24 30 Mon 2 2014-02-20 -3 4
2 2014-02-25 45 Tues 3 2014-02-20 -2 5
3 2014-02-26 30 Wed 4 2014-02-20 -1 6
4 2014-02-27 50 Thurs 5 2014-02-27 0 0
5 2014-02-28 50 Fri 6 2014-02-27 1 1
6 2014-03-01 20 Sat 7 2014-02-27 2 2
7 2014-03-02 35 Sun 1 2014-02-27 -4 3
8 2014-03-03 50 Mon 2 2014-02-27 -3 4
9 2014-03-04 35 Tues 3 2014-02-27 -2 5
10 2014-03-05 35 Wed 4 2014-02-27 -1 6
11 2014-03-06 50 Thurs 5 2014-03-06 0 0
12 2014-03-07 35 Fri 6 2014-03-06 1 1
13 2014-03-08 40 Sat 7 2014-03-06 2 2
14 2014-03-09 40 Sun 1 2014-03-06 -4 3
15 2014-03-10 20 Mon 2 2014-03-06 -3 4
16 2014-03-11 50 Tues 3 2014-03-06 -2 5
17 2014-03-12 25 Wed 4 2014-03-06 -1 6
18 2014-03-13 20 Thurs 5 2014-03-13 0 0
19 2014-03-14 30 Fri 6 2014-03-13 1 1
20 2014-03-15 50 Sat 7 2014-03-13 2 2
21 2014-03-16 50 Sun 1 2014-03-13 -4 3
22 2014-03-17 40 Mon 2 2014-03-13 -3 4
23 2014-03-18 40 Tues 3 2014-03-13 -2 5
24 2014-03-19 50 Wed 4 2014-03-13 -1 6
25 2014-03-20 40 Thurs 5 2014-03-20 0 0
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.