简体   繁体   中英

splitting the data frame based on condition

I have a dataframe called df. There are 20 rows and 2 variables test_value and day. I would like to create a new variable called test_x_max. It will capture the maximum value from the previous x records. Ex: if we assume X is 5 then if we are looking at row 15, it needs to pick the maximum test_value between day 10 to day 15. How can i achieve this? Thanks in Advance. Pavan

You can use zoo::rollmax combined with cummax :

library(zoo)

df$test_x_max <- c(cummax(df$test_value[1:4]), rollmax(df$test_value, 5, align = "right"))

For example:

set.seed(100)
df <- data.frame(day = 1:20, test_value = sample(20))
df$test_x_max <- c(cummax(df$test_value[1:4]), rollmax(df$test_value, 5, align = "right"))
df
#>    day test_value test_x_max
#> 1    1         10         10
#> 2    2          6         10
#> 3    3         16         16
#> 4    4         14         16
#> 5    5         12         16
#> 6    6          7         16
#> 7    7         19         19
#> 8    8         17         19
#> 9    9          4         19
#> 10  10         15         19
#> 11  11         13         19
#> 12  12          2         17
#> 13  13         11         15
#> 14  14          8         15
#> 15  15          3         13
#> 16  16          9         11
#> 17  17          1         11
#> 18  18         20         20
#> 19  19         18         20
#> 20  20          5         20

Newish package called slider seems appropriate if you like tidyverse style

library(dplyr)
library(slider)

set.seed(2020)

pretend_df <- tibble(
day = 1:20,
testvalue = sample(100, 20)
)

# if you MUST have 5 days worth
slide_dbl(pretend_df, ~ max(.x$testvalue), .before = 5, .complete = TRUE)
#>  [1] NA NA NA NA NA 88 88 88 88 70 70 72 93 93 93 93 93 93 80 82

# if you want to accept less than 5 days worth
slide_dbl(pretend_df, ~ max(.x$testvalue), .before = 5, .complete = FALSE)
#>  [1] 28 87 87 88 88 88 88 88 88 70 70 72 93 93 93 93 93 93 80 82

pretend_df$maxlast5 <- slide_dbl(pretend_df, ~ max(.x$testvalue), .before = 5, .complete = TRUE)

> pretend_df
# A tibble: 20 x 3
     day testvalue maxlast5
   <int>     <int>    <dbl>
 1     1        28       NA
 2     2        87       NA
 3     3        22       NA
 4     4        88       NA
 5     5        65       NA
 6     6        17       88
 7     7        36       88
 8     8        42       88
 9     9        70       88
10    10        49       70
11    11        56       70
12    12        72       72
13    13        93       93
14    14        80       93
15    15        29       93
16    16         3       93
17    17        66       93
18    18         4       93
19    19        78       80
20    20        82       82

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM