[英]R: How to count the number of consecutive occurrences in a longitudinal database with a length condition?
I am working on R with a longitudinal database about individuals, with several rows per ID (named vn
in the database) and their attributes in column.我正在研究 R 和一个关于个人的纵向数据库,每个 ID 有几行(在数据库中命名为
vn
),它们的属性在列中。 My variable observation
indicates each year of observation and maritalstatus
indicates whether the person is married 1
or not 0
.我的变量
observation
表示每年的观察, maritalstatus
表示此人是否已婚1
或未婚0
。
Here is an overview of an individual in my database:这是我数据库中个人的概述:
structure(list(vn = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), maritalstatus = c(0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1), observation = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018)), class = "data.frame")
I am looking for a way to create a new variable that counts the number of consecutive occurrences only the first time their length is greater or equal to 5. For this example it would be:我正在寻找一种方法来创建一个新变量,该变量仅在其长度第一次大于或等于 5 时计算连续出现的次数。对于此示例,它将是:
marital_length = c (0, 0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0)
My current code (below) creates a variable that counts the maximum length of consecutive numbers but I didn't find a way to add a condition to count only the first time the length is >= 5
.我当前的代码(如下)创建了一个变量来计算连续数字的最大长度,但我没有找到一种方法来添加一个条件来仅在第一次长度为
>= 5
时进行计数。
maritalstatus_consecutive <- tapply(test$maritalstatus, INDEX = test$vn, most_consecutive_val)```
test$marital_length <- maritalstatus_consecutive[test$vn]
I also tried to use min()
(instead of max) but for instance if a person is married 2 years, divorced, then married 6 years and I won't be able to see in this new variable that she was married 6 years if I don't add the condition >=5
.我也尝试使用
min()
(而不是 max),但例如,如果一个人结婚 2 年,离婚,然后结婚 6 年,我将无法在这个新变量中看到她结婚 6 年,如果我不添加条件>=5
。
Does anyone have an idea for a code that could help me?有没有人有一个可以帮助我的代码的想法?
I'm not entirely sure what your expected output is trying to represent.我不完全确定您期望的 output 试图代表什么。 If you'd like just the length of the first marriage >=5 years for each
vn
you could use如果您希望每个
vn
的第一次婚姻的长度 >=5 年,您可以使用
tapply(df$maritalstatus, df$vn, function(x) with(rle(x), lengths[lengths >= 5][1]) )
Maybe this is too convulated but seems to work:也许这太令人费解但似乎有效:
df$marital_length <- with(df, ave(maritalstatus, vn, FUN = function(x)
with(rle(x), rep(as.integer(seq_along(lengths) ==
which.max(lengths >= 5)) * lengths, lengths))))
df$marital_length
#[1] 0 0 0 0 0 0 5 5 5 5 5 0 0 0 0 0 0 0 0
which.max(lengths >= 5)
gives the index for first time when length is greater than 5. which.max(lengths >= 5)
在长度大于 5 时首次给出索引。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.