简体   繁体   English

计算R中观察值之间的“间隙”

[英]Count “gaps” between observations in R

I am having trouble with replicating a project that was done in Stata within R. One of the key snags I'm hitting is that I need to generate a variable that counts the number of years since a certain observation. 我在复制一个在R中的Stata中完成的项目时遇到了麻烦。我遇到的一个关键障碍是我需要生成一个变量来计算自某次观察以来的年数。 Here's a simple recreation of what the data might look like: 这里简单介绍一下数据的外观:

data <- cbind(1960:1970, c(NA, NA, 22, NA, NA, NA, 24, NA, NA, NA, 22), c(NA, NA, NA, NA, NA, NA, 4, NA, NA, NA, 4)) data < - cbind(1960:1970,c(NA,NA,22,NA,NA,NA,24,NA,NA,NA,22),c(NA,NA,NA,NA,NA,NA,4, NA,NA,NA,4))

      [,1] [,2] [,3]
 [1,] 1960   NA   NA
 [2,] 1961   NA   NA
 [3,] 1962   22   NA
 [4,] 1963   NA   NA
 [5,] 1964   NA   NA
 [6,] 1965   NA   NA
 [7,] 1966   24    4
 [8,] 1967   NA   NA
 [9,] 1968   NA   NA
[10,] 1969   NA   NA
[11,] 1970   22    4

I currently have the first two columns of data and I'm trying to automate the creation of column three with a function. 我目前前两列data ,我正在尝试使用函数自动创建第三列。

You can see that the third column is defined by the number of years between when values of the second column are not NAs but only after the first occurrence of the intervention (ie the second time column two has a value, but not the first). 您可以看到第三列是由第二列的值不是 NA之间的年数定义的,但仅在第一次出现干预之后(即第二列第二列具有值,而不是第一列)之间的年数。

If it's any help, here is the code in Stata that does this trick, where since is the third column in my simplified example. 如果它的任何帮助,这里是在Stata,做这一招,这里的代码since是我简单的例子第三列。 Basically this code is saying create new variable since that is defined as the number of years since there is a value in variable redist (second column in my example) after the first year there is a value in variable redist . 基本上这个代码是说创造新的变量since ,既然是在变量值定义为年数redist后的第一年(在我的例子第二列)存在变量的值redist

gen since=.
foreach n of numlist 1(1)10 {
    replace since = year - year[_n-`n'] if redist!=. & redist[_n-`n']!=. & since==.
}

Thanks for the help in advance! 我在这里先向您的帮助表示感谢!

You can add a column of NA values, then fill in the differences with a logical vector. 您可以添加一列NA值,然后使用逻辑矢量填充差异。 This assumes we begin with only the first two columns. 这假设我们只从前两列开始。

data <- cbind(data, NA)
nona <- !is.na(data[,2])
data[,3][nona] <- c(NA, diff(data[,1][nona]))

data
#      [,1] [,2] [,3]
# [1,] 1960   NA   NA
# [2,] 1961   NA   NA
# [3,] 1962   22   NA
# [4,] 1963   NA   NA
# [5,] 1964   NA   NA
# [6,] 1965   NA   NA
# [7,] 1966   24    4
# [8,] 1967   NA   NA
# [9,] 1968   NA   NA
#[10,] 1969   NA   NA
#[11,] 1970   22    4

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM