简体   繁体   中英

Count number of. consecutive occurrences in sequence per group in R

I have a slight program issue I cannot seem to figure out. I am wondering how i can in an elegant way count the number of consecutive numbers in a sequence starting from different values per group in r

for example, we have a data frame with names and numbers and would like to find minimize the data frame keeping only 1 entry per name and in the other the number of consecutive entries per name

names <- c(rep("bob",5), rep("henry",5), rep("maria",5))
goals <- c(1,2,3,5,4, 4,3,4,5,2, 1,2,4,6,5)
input.df <- data.frame(names, goals)

so starting from 1 the output data frame would be like the one below, where "bob" has a 3, since he had goals from 1 to 3 sequential entries in goals, henry has 0, cause he did not have a 1 or any ordered entries and maria has 2 because she had entries from 1 to 2

names <- c("bob", "henry", "maria")
runs <- c("3", "0", "2")
output.df.from.1 <- data.frame(names, goals)

and starting from 3, both bob and maria would have a 0 but henry would now have a 3 since he has 3, 4, 5.

names <- c("bob", "henry", "maria")
runs <- c("0", "3", "0")
output.df.from.3 <- data.frame(names, goals)

I am certain there must be a simple solution to this but I have not been able to find any, however I might be searching for the wrong things.

Does anyone have a suggestion?

Here is a possible solution to your answer. The idea is to 1) first find out the (multiple) consecutive numbers for each person, then 2) given a value, find out the length of the consecutive numbers starting from the value.

I changed your example data a bit to take into account the case where each person can have multiple consecutive numbers. (eg bob now have numbers 1,2,3,5,4, 7,8,9, and the consecutive groups are 1,2,3 and 7,8,9).

  1. Find the consecutive numbers for each person. First group by names , within each group, find the previous and next numbers of the goals . If it's consecutive, then either previous_goal - current_goal = -1 or next_goal - current_goal = 1 . Note I use both previous/next in order to retain all the values in a consecutive group.
library(tidyverse)
names <- c(rep("bob",8), rep("henry",5), rep("maria",5))
goals <- c(1,2,3,5,4, 7,8,9, 4,3,4,5,2, 1,2,4,6,5)
df1 <- data.frame(names, goals) 

df2 <- df1 %>% 
  group_by(names) %>%  
  mutate(goals_lag = lag(goals) - goals) %>% 
  mutate(goals_lead = lead(goals) - goals) %>% 
  filter(goals_lag == -1 | goals_lead == 1) %>% 
  select(-goals_lag, -goals_lead)
  1. Write a function to calculate the length of consecutive numbers starting from a given value. In the case of bob has two consecutive groups 1,2,3 and 7,8,9. If the given value is 1, then the length is supposed to be 3 not 6. Therefore we need to know where are the start positions of different consecutive groups (starting index is 4 for group 7,8,9). After we locate the position of the given value (if given value is 1, the index is 1), we can use the start position of the next group minus the given value position (4-1=3 in this case), that's how to calculate the length).
cons_len <- function(df, name, start_val){
  
# take goals as a vector
  vec <- (df %>% filter(names == name))$goals
# find the starting positions of different groups
  vec_stops <- which( (vec - c(vec[1] - 1, vec[-length(vec)])) != 1)
# find the index of the given value
  vec_start <- which(vec == start_val)
  
# if not find the value, return 0
  if (length(vec_start)==0) {
    return(0)
    
# if there is only one group of consecutive numbers
  } else if (length(vec_stops) == 0) {
    return(length(vec) - vec_start + 1)
    
  } else {
   
# if there are multiple groups of consecutive numbers
    len <- vec_stops[vec_start <= vec_stops][1] - vec_start
    return(ifelse(len == 1, 0, len))
  }
}

# apply to each name
sapply(unique(df1$names), function(name) cons_len(df2, name, 1))
# bob henry maria 
# 3     0     2 

sapply(unique(df1$names), function(name) cons_len(df2, name, 3))
# bob henry maria 
# 0     3     0 

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM