[英]R: Use rollapply on list column
I have a data frame df
with a dates
column and a values
column and want to calculate the 5% quantile of values
for the past n
days for every date in dates
. 我有一个数据帧
df
与dates
列和values
列,并要计算的5%分位values
在过去n
天在每次约会dates
。 The problem is that the dates do not appear uniquely and in different (random) cardinalities. 问题在于日期不是唯一地出现,并且没有不同的(随机)基数。 For example
例如
library(lubridate)
library(tidyverse)
library(zoo)
n <- 3
dates_v <- seq(as_date("2018-09-01"), as_date("2018-09-14"), by = "days")
df <- data.frame(dates = rep(dates_v,c(3, 2, 1, 4, 1, 5, 1, 3, 3, 2, 5, 3, 4, 3)),
values = rep(seq(1,5),8))
I can write a for loop that solves this problem, but this is quite slow: 我可以编写一个for循环来解决此问题,但这很慢:
df2 <- list()
for (k in dates_v[n:length(dates_v)]) {
k <- as_date(k)
df2 <- c(df2,
df %>%
filter(dates >= k %m-% days(n-1) & dates <= k) %>%
mutate(dates = k) %>%
group_by(dates) %>%
summarise(values = quantile(values, 0.05)) %>%
list())
}
df2 <- df2 %>%
bind_rows()
I tried the zoo
package, but the rollapply
functions do not seem to be applicable here due to the varying window sizes. 我尝试了
zoo
软件包,但由于窗口大小不同,因此rollapply
函数似乎不适用于此处。 One idea I had was to transform the values
column into a nested column using purrr::nest
and then use rollapply
to roll-concatenate the entries of the nested column 我的一个想法是使用
purrr::nest
将values
列转换为嵌套列,然后使用rollapply
将嵌套列的条目滚动连接
df2 <- df %>%
group_by(dates) %>%
nest() %>%
mutate(data = map(data, unlist))
df2$data <- rollapply(df2$data, width = n, c, align = "right")
df2 %>%
mutate(data = map(data, ~quantile(., 0.05)))
but that did not work out. 但这并没有解决。 Is there something I am doing wrong, or is
rollapply
simply not working with list columns? 我有做错什么吗,还是
rollapply
根本无法使用列表列?
Edit: 编辑:
A more realistic example for my use case is a data frame of the form 对于我的用例,一个更现实的示例是以下形式的数据框:
dates_v <- seq(as_date("2018-01-01"), as_date("2018-09-14"), by = "days")
df <- data.frame(dates = rep(dates_v,sample(seq(9000,11000), length(dates_v), replace = TRUE))) %>%
mutate(values = rnorm(length(dates)))
Instead of a loop you can use sapply
like this: 您可以像这样使用
sapply
代替循环:
n <- 3
sapply(unique(df$dates), function(x){
quantile(df$values[df$dates >= x - (n-1) & df$dates <= x], 0.05)
})
5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5% 5%
1.10 1.20 1.00 1.30 1.00 1.00 1.00 1.40 1.30 1.35 1.00 1.00 1.00 1.00
To get it into a data.frame you could do this: 要将其放入data.frame中,可以执行以下操作:
outcome <- data.frame(dates = unique(df$dates),
quantiles = sapply(unique(df$dates), function(x){
quantile(df$values[df$dates >= x - (n-1) & df$dates <= x], 0.05)
})
)
dates quantiles
1 2018-09-01 1.10
2 2018-09-02 1.20
3 2018-09-03 1.00
4 2018-09-04 1.30
5 2018-09-05 1.00
6 2018-09-06 1.00
7 2018-09-07 1.00
8 2018-09-08 1.40
9 2018-09-09 1.30
10 2018-09-10 1.35
11 2018-09-11 1.00
12 2018-09-12 1.00
13 2018-09-13 1.00
14 2018-09-14 1.00
rollapply
can be used with varying widths by specifying a vector of widths, w
, one per element. 通过指定每个元素一个宽度
w
的向量, rollapply
可以用于不同的宽度。 r
gives the quantiles for all rows from the first dates-2 row to the current row and the last line of code drops rows having dates which are not the last occurrence of that date and also drops the value
column. r
给出从第一个dates-2行到当前行的所有行的分位数,最后一行代码删除具有不是该日期最后一次出现的日期的行,并删除value
列。
w <- seq_along(df$dates) - match(df$dates - 2, df$dates, nomatch = 0)
r <- transform(df, `5%` = rollapplyr(values, w, quantile, 0.05),
check.names = FALSE)
r[!duplicated(df$dates, fromLast = TRUE), -2]
giving: 赠送:
dates 5%
3 2018-09-01 1.10
5 2018-09-02 1.20
6 2018-09-03 1.20
10 2018-09-04 1.25
11 2018-09-05 1.20
16 2018-09-06 1.00
17 2018-09-07 1.25
20 2018-09-08 1.35
23 2018-09-09 1.25
25 2018-09-10 1.30
30 2018-09-11 1.40
33 2018-09-12 1.00
37 2018-09-13 1.00
40 2018-09-14 1.40
or with pipes and using w
from above: 或使用管道并从上方使用
w
:
df %>%
mutate(`5%` = rollapplyr(.$values, w, quantile, 0.05)) %>%
filter(!duplicated(.$dates, fromLast = TRUE)) %>%
select(-values)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.