简体   繁体   中英

Dynamics “rollapply” in R

I am thinking of getting max of (N-1) rows by group in R (but not including the current row). Not sure how should I start with since each group has dynamics number of rows. For example:

我想要的例子

Thanks =)

Using DF defined reproducibly in the Note at the end, we can use rollapply to apply max taking the maximum of all prior values where specifying a width of list(-seq(.N)) to rollapply means to use offsets -1, -2, ..., -.N . partial=TRUE says to use whatever number of values are available among the specified offsets even if some of the offsets are not available. Finally run it on each group using by= .

This solution uses rollapply and data.table, as requested, and involves direct specification of which offsets to use as opposed to shifting or lagging.

We could change max to min , sum , mean , median or anything else that is appropriate if we wanted those. If we wanted to change this to just find the maximum of the prior k values then we could replace -seq(.N) with -seq(k) .

library(data.table)
library(zoo)

DT <- as.data.table(DF)
DT[, max := rollapply(Count, list(-seq(.N)), max, partial = TRUE, fill = NA), by = Group]

giving:

> DT
   Group Count max
1:     A    23  NA
2:     A    12  23
3:     A   145  23
4:     B   123  NA
5:     B    34 123
6:     B   456 123
7:     B   555 456

Note

The input DF in reproducible form is:

DF <- data.frame(Group = c("A", "A", "A", "B", "B", "B", "B"),
                 Count = c(23, 12, 145, 123, 34, 456, 555))

Update

Fixed original and updated it with a shorter solution and one which uses data.table as per data.table tag on question.

A possible solution with the data.table -package and the cummax and shift functions:

library(data.table)
setDT(df)[, lagged_max := shift(cummax(Count)), by = Group][]

which gives:

 > df Group Count lagged_max 1: A 23 NA 2: A 12 23 3: A 145 23 4: B 123 NA 5: B 34 123 6: B 456 123 7: B 555 456 

What this does:

  • setDT(df) converts the dataframe df to a data.table .
  • cummax(Count) gets the cumulative maximum of Count .
  • With shift the result is shifted one position upward because the default values for the n , type & fill parameters are 1L , lag an NA respectively, Written in full it is: shift(cummax(Count), n = 1L, type = 'lag', fill = NA) .

The same logic in base R:

transform(df, lagged_max = ave(Count, Group, FUN = function(x) c(NA, head(cummax(x), -1))))

Or with dplyr :

library(dplyr)
df %>% 
  group_by(Group) %>% 
  mutate(lagged_max = lag(cummax(Count)))

Used data:

df <- data.frame(Group = c("A", "A", "A", "B", "B", "B", "B"),
                 Count = c(23, 12, 145, 123, 34, 456, 555))

Here is solution using dplyr::lag :

# Sample data
df <- data.frame(
    Group = c("A", "A", "A", "B", "B", "B", "B"),
    Count = c(23, 12, 145, 123, 34, 456, 555))

library(tidyverse);
df %>%
    group_by(Group) %>%
    mutate(
        n1 = lag(Count, 1),
        n2 = lag(Count, 2),
        max = ifelse(!is.na(n2) & !is.na(n1) & n2 > n1, n2, n1)) %>%
    select(-n1, -n2)
## A tibble: 7 x 3
## Groups:   Group [2]
#  Group Count   max
#  <fct> <dbl> <dbl>
#1 A       23.   NA
#2 A       12.   23.
#3 A      145.   23.
#4 B      123.   NA
#5 B       34.  123.
#6 B      456.  123.
#7 B      555.  456.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM