More efficient way to compute mean for subset

Question

In this dataframe:

df <- data.frame(
  comp = c("pre",rep("story",4), rep("x",2), rep("story",3)),
  hbr = c(101:110)
)

let's say I need to compute the mean for hbr subsetted to the first stretch where comp=="story" , how would I do that more efficiently than this way, which seems bulky and longwinded and requires that I specify the grp I want to compute the mean for manually :

library(dplyr)
library(data.table)
df %>%
  mutate(grp = rleid(comp)) %>%
  summarise(M = mean(hbr[grp==2]))
      M
1 103.5

Answer 1

I'm not sure if this is any better, but at least you only need to specify that you want the first run of 'story':

df %>%
  mutate(grp = ifelse(comp == 'story', rleid(comp), NA)) %>%
  filter(grp == min(grp, na.rm = TRUE)) %>%
  summarise(M = mean(hbr))
#>       M
#> 1 103.5

Answer 2

In base R, you can select the desired rows using cumsum and diff , and then choosing which group you need (here it's the first, so 1), and then compute the mean on those rows. With this option, you don't need to get the group you need manually and you don't require any additional packages.

idx <- which(df$comp == "story")
first <- idx[cumsum(c(1, diff(idx) != 1)) == 1]
#[1] 2 3 4 5

mean(df$hbr[first])
#[1] 103.5

More efficient way to compute mean for subset

Question

2 answers

solution1
2 ACCPTED 2022-05-16 08:52:24

solution2
2 2022-05-16 09:01:03

More efficient way to compute mean for subset

Question

2 answers

solution1 2 ACCPTED 2022-05-16 08:52:24

solution2 2 2022-05-16 09:01:03

solution1
2 ACCPTED 2022-05-16 08:52:24

solution2
2 2022-05-16 09:01:03