简体   繁体   中英

Extracting event rows from a data frame

I have this data frame:

df <-
ID  var TIME  value  method
 1   3   0     2      1
 1   3   2     2      1
 1   3   3     0      1
 1   4   0     10     1
 1   4   2     10     1
 1   4   4     5      1 
 1   4   6     5      1        
 2   3   0     2      1
 2   3   2     2      1
 2   3   3     0      1
 2   4   0     10     1
 2   4   2     10     1
 2   4   4     5      1 
 2   4   6     5      1   

I want to extract rows that has a new eventin value column. For example, for ID=1 , var=3 has a value of 2 at TIME=0 . This value stays the same at TIME=1 , so I would take the first row at TIME=0 only and discard the second row. However, the third row, the value for var=3 has changed into zero , so I have also to extract this row. And so on for the rest of the variables. This has to be applied for every subject ID. For the above df , the result should be as follows:

dfevent <-  
ID  var TIME  value  method
 1   3   0     2      1
 1   3   3     0      1
 1   4   0     10     1
 1   4   4     5      1 
 2   3   0     2      1
 2   3   3     0      1
 2   4   0     10     1
 2   4   4     5      1 

Could any one help me doing this in R? I have a huge data set and I want to extract the information at which a new event has occurred for the value of every var . I have 4 variables in the data frame numbered (3, 4,5,6, and 7). The above is an example for 2 variables (variable number: 3 and 4).

This does it using dplyr

 library(dplyr)

 df %>% 
  group_by(ID, var) %>%
  mutate(tf = ifelse(value==lag(value), 1, 0))  %>%
  filter(is.na(tf) | tf==0) %>%
  select(-tf)


#  ID var TIME value method
#1  1   3    0     2      1
#2  1   3    3     0      1
#3  1   4    0    10      1
#4  1   4    4     5      1
#5  2   3    0     2      1
#6  2   3    3     0      1
#7  2   4    0    10      1
#8  2   4    4     5      1

basically, I created an extra variable that returns a '1' when the value is the same as the preceding row within groups of unique ID/var combinations. We then get rid of this variable before returning the output.

Base solution:

df[with(df, abs(ave(value,ID,FUN=function(x) c(1,diff(x)) ))) > 0,]

#   ID var TIME value method
#1   1   3    0     2      1
#3   1   3    3     0      1
#4   1   4    0    10      1
#6   1   4    4     5      1
#8   2   3    0     2      1
#10  2   3    3     0      1
#11  2   4    0    10      1
#13  2   4    4     5      1

From the expected results, you may also try rleid from data.table

library(data.table)#data.table_1.9.5
 setDT(df)[df[, .I[1L] , list(ID, var, rleid(value))]$V1]
#    ID var TIME value method
#1:  1   3    0     2      1
#2:  1   3    3     0      1
#3:  1   4    0    10      1
#4:  1   4    4     5      1
#5:  2   3    0     2      1
#6:  2   3    3     0      1
#7:  2   4    0    10      1
#8:  2   4    4     5      1

Or a similar approach as @thelatemail

setDT(df)[df[, .I[abs(c(1,diff(value)))>0] , ID]$V1]

Or

unique(setDT(df)[, id:=rleid(value)], by=c('ID', 'var', 'id'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM