[英]Create new variable based on prior observation value from another column
I am constructing a new variable that's value is contingent on the prior row in another column. 我正在构造一个新变量,该变量的值取决于另一列中的上一行。 Therefore, the order of the data is important.
因此,数据的顺序很重要。 This is how my data currently looks
这是我当前数据的外观
ID Cong Comm Y
1 52 3 0
1 53 3 0
1 54 3 1
1 53 4 1
2 50 2 1
2 50 7 1
3 48 4 1
4 48 3 1
4 48 7 0
4 49 7 1
I would like to create a new variable called Y2. 我想创建一个名为Y2的新变量。 If the observation's Y=0, then Y2 in the same observation should equal 1. If the following row's has Y=0, then add 1 to the previous Y2 value (the Y2 value for this observation should equal 2).
如果观测值的Y = 0,则同一观测值中的Y2应等于1。如果下一行的Y = 0,则将上一个Y2值加1(此观测值的Y2值应等于2)。 Continue this process until Y=1, add 1, and then stop the process.
继续此过程,直到Y = 1,加1,然后停止该过程。 Essentially, the new variable counts up until the other column's value equals "1" and repeats the process.
本质上,新变量递增计数,直到另一列的值等于“ 1”并重复该过程。
This is what it should look like: 它应该是这样的:
ID Cong Comm Y Y2
1 52 3 0 1
1 53 3 0 2
1 54 3 1 3
1 53 4 1 1
2 50 2 1 1
2 50 7 1 1
3 48 4 1 1
4 48 3 1 1
4 48 7 0 1
4 49 7 1 2
Here is my sample dataframe. 这是我的示例数据框。
data.frame(
ID = c(1L, 1L, 1L, 1L, 2L, 2L, 3L, 4L, 4L, 4L),
Cong = c(52L, 53L, 54L, 53L, 50L, 50L, 48L, 48L, 48L, 49L),
Comm = c(3L, 3L, 3L, 4L, 2L, 7L, 4L, 3L, 7L, 7L),
Y=c(0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 0L, 1L))
Would a loop or if-else command be the best way to tackle this? 循环或if-else命令是否是解决此问题的最佳方法? I tried an if-else statement, but my code did not work.
我尝试了if-else语句,但是我的代码无法正常工作。 Any recommendations would be great.
任何建议都会很棒。
You can do this like this, supposing your data.frame is df
: 假设您的data.frame是
df
,您可以这样做:
y = df$Y
bool=y==c(0, head(y, -1))
y[which(bool %in% F)] = 0
df$Y2 = ifelse(y==0, f7(!y), 1)
# ID Cong Comm Y Y2
#1 1 52 3 0 1
#2 1 53 3 0 2
#3 1 54 3 1 3
#4 1 53 4 1 1
#5 2 50 2 1 1
#6 2 50 7 1 1
#7 3 48 4 1 1
#8 4 48 3 1 1
#9 4 48 7 0 1
#10 4 49 7 1 2
The trick is done with: 技巧是通过以下方式完成的:
f7 <- function(x){ tmp<-cumsum(x);tmp-cummax((!x)*tmp)}
Entirely defined in this great post: 完全定义在这篇很棒的文章中:
count how many consecutive values are true 计算多少个连续值是正确的
Finally this solution is entirely vectorized, no loop. 最后,此解决方案是完全矢量化的,没有循环。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.