[英]Count “gaps” between observations in R
I am having trouble with replicating a project that was done in Stata within R. One of the key snags I'm hitting is that I need to generate a variable that counts the number of years since a certain observation. 我在复制一个在R中的Stata中完成的项目时遇到了麻烦。我遇到的一个关键障碍是我需要生成一个变量来计算自某次观察以来的年数。 Here's a simple recreation of what the data might look like: 这里简单介绍一下数据的外观:
data <- cbind(1960:1970, c(NA, NA, 22, NA, NA, NA, 24, NA, NA, NA, 22), c(NA, NA, NA, NA, NA, NA, 4, NA, NA, NA, 4)) data < - cbind(1960:1970,c(NA,NA,22,NA,NA,NA,24,NA,NA,NA,22),c(NA,NA,NA,NA,NA,NA,4, NA,NA,NA,4))
[,1] [,2] [,3]
[1,] 1960 NA NA
[2,] 1961 NA NA
[3,] 1962 22 NA
[4,] 1963 NA NA
[5,] 1964 NA NA
[6,] 1965 NA NA
[7,] 1966 24 4
[8,] 1967 NA NA
[9,] 1968 NA NA
[10,] 1969 NA NA
[11,] 1970 22 4
I currently have the first two columns of data
and I'm trying to automate the creation of column three with a function. 我目前有前两列data
,我正在尝试使用函数自动创建第三列。
You can see that the third column is defined by the number of years between when values of the second column are not NAs but only after the first occurrence of the intervention (ie the second time column two has a value, but not the first). 您可以看到第三列是由第二列的值不是 NA之间的年数定义的,但仅在第一次出现干预之后(即第二列第二列具有值,而不是第一列)之间的年数。
If it's any help, here is the code in Stata that does this trick, where since
is the third column in my simplified example. 如果它的任何帮助,这里是在Stata,做这一招,这里的代码since
是我简单的例子第三列。 Basically this code is saying create new variable since
that is defined as the number of years since there is a value in variable redist
(second column in my example) after the first year there is a value in variable redist
. 基本上这个代码是说创造新的变量since
,既然是在变量值定义为年数redist
后的第一年(在我的例子第二列)存在变量的值redist
。
gen since=.
foreach n of numlist 1(1)10 {
replace since = year - year[_n-`n'] if redist!=. & redist[_n-`n']!=. & since==.
}
Thanks for the help in advance! 我在这里先向您的帮助表示感谢!
You can add a column of NA
values, then fill in the differences with a logical vector. 您可以添加一列NA
值,然后使用逻辑矢量填充差异。 This assumes we begin with only the first two columns. 这假设我们只从前两列开始。
data <- cbind(data, NA)
nona <- !is.na(data[,2])
data[,3][nona] <- c(NA, diff(data[,1][nona]))
data
# [,1] [,2] [,3]
# [1,] 1960 NA NA
# [2,] 1961 NA NA
# [3,] 1962 22 NA
# [4,] 1963 NA NA
# [5,] 1964 NA NA
# [6,] 1965 NA NA
# [7,] 1966 24 4
# [8,] 1967 NA NA
# [9,] 1968 NA NA
#[10,] 1969 NA NA
#[11,] 1970 22 4
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.