简体   繁体   English

R:如何用长度条件计算纵向数据库中连续出现的次数?

[英]R: How to count the number of consecutive occurrences in a longitudinal database with a length condition?

I am working on R with a longitudinal database about individuals, with several rows per ID (named vn in the database) and their attributes in column.我正在研究 R 和一个关于个人的纵向数据库,每个 ID 有几行(在数据库中命名为vn ),它们的属性在列中。 My variable observation indicates each year of observation and maritalstatus indicates whether the person is married 1 or not 0 .我的变量observation表示每年的观察, maritalstatus表示此人是否已婚1或未婚0

Here is an overview of an individual in my database:这是我数据库中个人的概述:

structure(list(vn = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), maritalstatus = c(0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1), observation = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018)), class = "data.frame")

I am looking for a way to create a new variable that counts the number of consecutive occurrences only the first time their length is greater or equal to 5. For this example it would be:我正在寻找一种方法来创建一个新变量,该变量仅在其长度第一次大于或等于 5 时计算连续出现的次数。对于此示例,它将是:

marital_length = c (0, 0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0)

My current code (below) creates a variable that counts the maximum length of consecutive numbers but I didn't find a way to add a condition to count only the first time the length is >= 5 .我当前的代码(如下)创建了一个变量来计算连续数字的最大长度,但我没有找到一种方法来添加一个条件来仅在第一次长度为>= 5时进行计数。


maritalstatus_consecutive <- tapply(test$maritalstatus, INDEX = test$vn, most_consecutive_val)```

test$marital_length <- maritalstatus_consecutive[test$vn]

I also tried to use min() (instead of max) but for instance if a person is married 2 years, divorced, then married 6 years and I won't be able to see in this new variable that she was married 6 years if I don't add the condition >=5 .我也尝试使用min() (而不是 max),但例如,如果一个人结婚 2 年,离婚,然后结婚 6 年,我将无法在这个新变量中看到她结婚 6 年,如果我不添加条件>=5

Does anyone have an idea for a code that could help me?有没有人有一个可以帮助我的代码的想法?

I'm not entirely sure what your expected output is trying to represent.我不完全确定您期望的 output 试图代表什么。 If you'd like just the length of the first marriage >=5 years for each vn you could use如果您希望每个vn的第一次婚姻的长度 >=5 年,您可以使用

tapply(df$maritalstatus, df$vn, function(x) with(rle(x), lengths[lengths >= 5][1]) )

Maybe this is too convulated but seems to work:也许这太令人费解但似乎有效:

df$marital_length <- with(df, ave(maritalstatus, vn, FUN = function(x) 
                with(rle(x), rep(as.integer(seq_along(lengths) == 
                     which.max(lengths >= 5)) * lengths, lengths))))


df$marital_length
#[1] 0 0 0 0 0 0 5 5 5 5 5 0 0 0 0 0 0 0 0

which.max(lengths >= 5) gives the index for first time when length is greater than 5. which.max(lengths >= 5)在长度大于 5 时首次给出索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM