简体   繁体   English

R:在列表列上使用rollapply

[英]R: Use rollapply on list column

I have a data frame df with a dates column and a values column and want to calculate the 5% quantile of values for the past n days for every date in dates . 我有一个数据帧dfdates列和values列,并要计算的5%分位values在过去n天在每次约会dates The problem is that the dates do not appear uniquely and in different (random) cardinalities. 问题在于日期不是唯一地出现,并且没有不同的(随机)基数。 For example 例如

library(lubridate)
library(tidyverse)
library(zoo)

n <- 3

dates_v <- seq(as_date("2018-09-01"), as_date("2018-09-14"), by = "days") 

df <- data.frame(dates = rep(dates_v,c(3, 2, 1, 4, 1, 5, 1, 3, 3, 2, 5, 3, 4, 3)), 
                 values = rep(seq(1,5),8))

I can write a for loop that solves this problem, but this is quite slow: 我可以编写一个for循环来解决此问题,但这很慢:

df2 <- list()

for (k in dates_v[n:length(dates_v)]) {
  k <- as_date(k)

  df2 <- c(df2,
  df %>%
    filter(dates >= k %m-% days(n-1) & dates <= k) %>%
    mutate(dates = k) %>%
    group_by(dates) %>%
    summarise(values = quantile(values, 0.05)) %>%
    list())
}

df2 <- df2 %>%
  bind_rows() 

I tried the zoo package, but the rollapply functions do not seem to be applicable here due to the varying window sizes. 我尝试了zoo软件包,但由于窗口大小不同,因此rollapply函数似乎不适用于此处。 One idea I had was to transform the values column into a nested column using purrr::nest and then use rollapply to roll-concatenate the entries of the nested column 我的一个想法是使用purrr::nestvalues列转换为嵌套列,然后使用rollapply将嵌套列的条目滚动连接

df2 <- df %>%
  group_by(dates) %>%
  nest() %>%
  mutate(data = map(data, unlist))

df2$data <- rollapply(df2$data, width = n, c, align = "right")

df2 %>% 
  mutate(data = map(data, ~quantile(., 0.05)))

but that did not work out. 但这并没有解决。 Is there something I am doing wrong, or is rollapply simply not working with list columns? 我有做错什么吗,还是rollapply根本无法使用列表列?

Edit: 编辑:

A more realistic example for my use case is a data frame of the form 对于我的用例,一个更现实的示例是以下形式的数据框:

dates_v <- seq(as_date("2018-01-01"), as_date("2018-09-14"), by = "days") 

df <- data.frame(dates = rep(dates_v,sample(seq(9000,11000), length(dates_v), replace = TRUE))) %>%
  mutate(values = rnorm(length(dates)))

Instead of a loop you can use sapply like this: 您可以像这样使用sapply代替循环:

n <- 3
sapply(unique(df$dates), function(x){
                        quantile(df$values[df$dates >= x - (n-1) & df$dates <= x], 0.05)
                      })
  5%   5%   5%   5%   5%   5%   5%   5%   5%   5%   5%   5%   5%   5% 
1.10 1.20 1.00 1.30 1.00 1.00 1.00 1.40 1.30 1.35 1.00 1.00 1.00 1.00 

To get it into a data.frame you could do this: 要将其放入data.frame中,可以执行以下操作:

outcome <- data.frame(dates = unique(df$dates),
                      quantiles = sapply(unique(df$dates), function(x){
                        quantile(df$values[df$dates >= x - (n-1) & df$dates <= x], 0.05)
                      })
                      )
        dates quantiles
1  2018-09-01      1.10
2  2018-09-02      1.20
3  2018-09-03      1.00
4  2018-09-04      1.30
5  2018-09-05      1.00
6  2018-09-06      1.00
7  2018-09-07      1.00
8  2018-09-08      1.40
9  2018-09-09      1.30
10 2018-09-10      1.35
11 2018-09-11      1.00
12 2018-09-12      1.00
13 2018-09-13      1.00
14 2018-09-14      1.00

rollapply can be used with varying widths by specifying a vector of widths, w , one per element. 通过指定每个元素一个宽度w的向量, rollapply可以用于不同的宽度。 r gives the quantiles for all rows from the first dates-2 row to the current row and the last line of code drops rows having dates which are not the last occurrence of that date and also drops the value column. r给出从第一个dates-2行到当前行的所有行的分位数,最后一行代码删除具有不是该日期最后一次出现的日期的行,并删除value列。

w <- seq_along(df$dates) - match(df$dates - 2, df$dates, nomatch = 0)
r <- transform(df, `5%` = rollapplyr(values, w, quantile, 0.05), 
  check.names = FALSE)

r[!duplicated(df$dates, fromLast = TRUE), -2]

giving: 赠送:

        dates   5%
3  2018-09-01 1.10
5  2018-09-02 1.20
6  2018-09-03 1.20
10 2018-09-04 1.25
11 2018-09-05 1.20
16 2018-09-06 1.00
17 2018-09-07 1.25
20 2018-09-08 1.35
23 2018-09-09 1.25
25 2018-09-10 1.30
30 2018-09-11 1.40
33 2018-09-12 1.00
37 2018-09-13 1.00
40 2018-09-14 1.40

or with pipes and using w from above: 或使用管道并从上方使用w

df %>%
   mutate(`5%` = rollapplyr(.$values, w, quantile, 0.05)) %>%
   filter(!duplicated(.$dates, fromLast = TRUE)) %>%
   select(-values)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM