R：如何用长度条件计算纵向数据库中连续出现的次数？

Question

I am working on R with a longitudinal database about individuals, with several rows per ID (named vn in the database) and their attributes in column.我正在研究 R 和一个关于个人的纵向数据库，每个 ID 有几行（在数据库中命名为vn ），它们的属性在列中。 My variable observation indicates each year of observation and maritalstatus indicates whether the person is married 1 or not 0 .我的变量observation表示每年的观察， maritalstatus表示此人是否已婚1或未婚0 。

Here is an overview of an individual in my database:这是我数据库中个人的概述：

structure(list(vn = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), maritalstatus = c(0, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1), observation = c(2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018)), class = "data.frame")

I am looking for a way to create a new variable that counts the number of consecutive occurrences only the first time their length is greater or equal to 5. For this example it would be:我正在寻找一种方法来创建一个新变量，该变量仅在其长度第一次大于或等于 5 时计算连续出现的次数。对于此示例，它将是：

marital_length = c (0, 0, 0, 0, 0, 0, 5, 5, 5, 5, 5, 0, 0, 0, 0, 0, 0, 0, 0)

My current code (below) creates a variable that counts the maximum length of consecutive numbers but I didn't find a way to add a condition to count only the first time the length is >= 5 .我当前的代码（如下）创建了一个变量来计算连续数字的最大长度，但我没有找到一种方法来添加一个条件来仅在第一次长度为>= 5时进行计数。


maritalstatus_consecutive <- tapply(test$maritalstatus, INDEX = test$vn, most_consecutive_val)```

test$marital_length <- maritalstatus_consecutive[test$vn]

I also tried to use min() (instead of max) but for instance if a person is married 2 years, divorced, then married 6 years and I won't be able to see in this new variable that she was married 6 years if I don't add the condition >=5 .我也尝试使用min() （而不是 max），但例如，如果一个人结婚 2 年，离婚，然后结婚 6 年，我将无法在这个新变量中看到她结婚 6 年，如果我不添加条件>=5 。

Does anyone have an idea for a code that could help me?有没有人有一个可以帮助我的代码的想法？

Answer 1

I'm not entirely sure what your expected output is trying to represent.我不完全确定您期望的 output 试图代表什么。 If you'd like just the length of the first marriage >=5 years for each vn you could use如果您希望每个vn的第一次婚姻的长度 >=5 年，您可以使用

tapply(df$maritalstatus, df$vn, function(x) with(rle(x), lengths[lengths >= 5][1]) )

Answer 2

Maybe this is too convulated but seems to work:也许这太令人费解但似乎有效：

df$marital_length <- with(df, ave(maritalstatus, vn, FUN = function(x) 
                with(rle(x), rep(as.integer(seq_along(lengths) == 
                     which.max(lengths >= 5)) * lengths, lengths))))


df$marital_length
#[1] 0 0 0 0 0 0 5 5 5 5 5 0 0 0 0 0 0 0 0

which.max(lengths >= 5) gives the index for first time when length is greater than 5. which.max(lengths >= 5)在长度大于 5 时首次给出索引。

R：如何用长度条件计算纵向数据库中连续出现的次数？

问题描述

2 个解决方案

解决方案1
1 2020-07-07 11:56:45

解决方案2
0 已采纳 2020-07-07 11:53:03

R：如何用长度条件计算纵向数据库中连续出现的次数？

问题描述

2 个解决方案

解决方案1 1 2020-07-07 11:56:45

解决方案2 0 已采纳 2020-07-07 11:53:03

解决方案1
1 2020-07-07 11:56:45

解决方案2
0 已采纳 2020-07-07 11:53:03