[英]Holes in ordered groups
I have a data frame ordered by id
and year
, observed n times over a number of years.我有一个按id
和year
排序的数据框,多年来观察了 n 次。 Number of observations per individual per year is irregular.每个人每年的观察次数是不规则的。 I define a "hole" in the data as an observation where x2=1
and the observation immediatly above, for the same id
(not necessarily for the same year
), is equal to 0. For example, individual A has a hole in 2002. When this happens, I need to create a variable where I store the value of x1
immediatly above, for which x2=0
.我将数据中的一个“洞”定义为一个观察值,其中x2=1
和上面的观察值,对于相同的id
(不一定是同year
),等于 0。例如,个人 A 在 2002 年有一个洞. 发生这种情况时,我需要创建一个变量,我将x1
的值直接存储在上面,为此x2=0
。 In the example of individual A, I would then need the new variable to equal 5 when x2=1
.在个人 A 的示例中,我需要新变量在x2=1
时等于 5。
x1 = c(5,3,2,2,5,7,7,3,4,8)
x2 = c(0,1,0,1,0,1,0,1,0,1)
id = c("A","A","A","B","B","C","C","C","D","D")
year = c(2001,2002,2003,2001,2002,2001,2001,2002,2001,2002)
df = data.frame(year,id,x1,x2)
Considering this sample data frame, I would need the new variable to look like this:考虑到这个示例数据框,我需要新变量如下所示:
outcome = c(.,5,.,.,.,.,.,7,.,4)
The dataset I'm working with has close to 10.000.000 observations, for 3.000.000 individuals over 4 years, so I can't do this manually.我正在使用的数据集有近 10.000.000 个观察值,针对 3.000.000 个人超过 4 年,所以我无法手动执行此操作。 Is there any generalized way to achieve this that works with any dataset, regardless of dimension?是否有任何通用的方法来实现这一点,适用于任何数据集,无论维度如何?
I went through a few posts here using for loops to iterate over groups (one example was this one Iterating a for loop over groups in a dataset ) but I wasn't able to apply any of it.我在这里浏览了一些使用 for 循环迭代组的帖子(一个例子是Iterating a for loop over groups in a dataset ),但我无法应用其中的任何一个。 I've been trying to do it in R after being unsuccessful in stata 14. I wasn't able to find any post that applied to ordered groups, which is what I'm looking for.在 stata 14 中失败后,我一直在尝试在 R 中进行此操作。我找不到任何适用于有序组的帖子,这正是我正在寻找的。
Here's a simple way to get your outcome
with dplyr
.这是使用dplyr
获得outcome
的简单方法。
library(dplyr)
df %>%
group_by(id) %>%
mutate(
outcome = ifelse(x2 == 1 & lag(x2) == 0, lag(x1), NA)
)
Result结果
# A tibble: 10 × 5
# Groups: id [4]
year id x1 x2 outcome
<dbl> <chr> <dbl> <dbl> <dbl>
1 2001 A 5 0 NA
2 2002 A 3 1 5
3 2003 A 2 0 NA
4 2001 B 2 1 NA
5 2002 B 5 0 NA
6 2001 C 7 1 NA
7 2001 C 7 0 NA
8 2002 C 3 1 7
9 2001 D 4 0 NA
10 2002 D 8 1 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.