[英]Extracting event rows from a data frame
I have this data frame: 我有这个数据框:
df <-
ID var TIME value method
1 3 0 2 1
1 3 2 2 1
1 3 3 0 1
1 4 0 10 1
1 4 2 10 1
1 4 4 5 1
1 4 6 5 1
2 3 0 2 1
2 3 2 2 1
2 3 3 0 1
2 4 0 10 1
2 4 2 10 1
2 4 4 5 1
2 4 6 5 1
I want to extract rows that has a new eventin value
column. 我想提取具有新的eventin
value
列的行。 For example, for ID=1
, var=3
has a value
of 2
at TIME=0
. 例如,对于
ID=1
, var=3
在TIME=0
的value
2
。 This value stays the same at TIME=1
, so I would take the first row at TIME=0
only and discard the second row. 这个值在
TIME=1
保持不变,因此我只将第一行放在TIME=0
并丢弃第二行。 However, the third row, the value for var=3
has changed into zero
, so I have also to extract this row. 但是,第三行
var=3
的值已更改zero
,因此我也必须提取此行。 And so on for the rest of the variables. 其余的变量依此类推。 This has to be applied for every subject ID.
这必须应用于每个主题ID。 For the above
df
, the result should be as follows: 对于上述
df
,结果应如下所示:
dfevent <-
ID var TIME value method
1 3 0 2 1
1 3 3 0 1
1 4 0 10 1
1 4 4 5 1
2 3 0 2 1
2 3 3 0 1
2 4 0 10 1
2 4 4 5 1
Could any one help me doing this in R? 有人可以帮我在R中这样做吗? I have a huge data set and I want to extract the information at which a new event has occurred for the value of every
var
. 我有一个庞大的数据集,并且我想为每个
var
的值提取发生新事件的信息。 I have 4 variables in the data frame numbered (3, 4,5,6, and 7). 我在数据框中编号为3、4、5、6和7的4个变量。 The above is an example for 2 variables (variable number: 3 and 4).
上面是2个变量(变量号:3和4)的示例。
This does it using dplyr
这使用
dplyr
library(dplyr)
df %>%
group_by(ID, var) %>%
mutate(tf = ifelse(value==lag(value), 1, 0)) %>%
filter(is.na(tf) | tf==0) %>%
select(-tf)
# ID var TIME value method
#1 1 3 0 2 1
#2 1 3 3 0 1
#3 1 4 0 10 1
#4 1 4 4 5 1
#5 2 3 0 2 1
#6 2 3 3 0 1
#7 2 4 0 10 1
#8 2 4 4 5 1
basically, I created an extra variable that returns a '1' when the value is the same as the preceding row within groups of unique ID/var combinations. 基本上,我创建了一个额外的变量,当该值与唯一ID /变量组合的组中的前一行相同时,返回“ 1”。 We then get rid of this variable before returning the output.
然后,在返回输出之前,我们先删除此变量。
Base solution: 基本解决方案:
df[with(df, abs(ave(value,ID,FUN=function(x) c(1,diff(x)) ))) > 0,]
# ID var TIME value method
#1 1 3 0 2 1
#3 1 3 3 0 1
#4 1 4 0 10 1
#6 1 4 4 5 1
#8 2 3 0 2 1
#10 2 3 3 0 1
#11 2 4 0 10 1
#13 2 4 4 5 1
From the expected results, you may also try rleid
from data.table
从预期的结果,你也可以尝试
rleid
从data.table
library(data.table)#data.table_1.9.5
setDT(df)[df[, .I[1L] , list(ID, var, rleid(value))]$V1]
# ID var TIME value method
#1: 1 3 0 2 1
#2: 1 3 3 0 1
#3: 1 4 0 10 1
#4: 1 4 4 5 1
#5: 2 3 0 2 1
#6: 2 3 3 0 1
#7: 2 4 0 10 1
#8: 2 4 4 5 1
Or a similar approach as @thelatemail 或与@thelatemail类似的方法
setDT(df)[df[, .I[abs(c(1,diff(value)))>0] , ID]$V1]
Or 要么
unique(setDT(df)[, id:=rleid(value)], by=c('ID', 'var', 'id'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.