简体   繁体   English

在R中生成一个新变量,其中第n个观察值取决于另一个列的第n-1次观察

[英]Generating a new variable in R where the nth observation depends on the n-1th observation of another column

Suppose I have a data frame that look something like this: 假设我有一个看起来像这样的数据框:

>df
city  year  ceep
  1    1      1
  1    2      1
  1    3      0
  1    4      1
  1    5      0
  2    1      0
  2    2      1
  2    3      1
  2    4      0
  2    5      1
  3    1      1
  3    2      0
  3    3      1
  3    4      0
  3    5      1

Now I want to create a new variable 'veep' that depends on the values of 'city' and 'ceep' from different rows. 现在我想创建一个新变量'veep',它取决于不同行中'city'和'ceep'的值。 For instance, 例如,

veep=1 if ceep[_n-1]=1 & city=city[_n-1]
veep=1 if ceep[_n+2]=1 & ceep[_n+3]=1 & city=city[_n+3] 

where n is the row of observation. 其中n是观察行。 I'm not sure how to translate these conditions into R language. 我不确定如何将这些条件转换为R语言。 I guess where I'm having trouble is choosing the row of observation. 我猜我遇到麻烦的地方是选择一排观察。 I'm thinking of a code somewhere along the lines of: 我正在考虑一个代码:

df$veep[df$ceep(of the n-1th observation)==1 & city==city(n-1th observ.)] <- 1
df$veep[df$ceep(of the n+2th observation)==1 & df$ceep(of the n+3th observation)==1 &
city==city(n+3th observ.)] <- 1

#note: what's in parentheses is just to demonstrate where I'm having trouble 

Can anyone provide help on this? 任何人都可以提供帮助吗?

Here's a way to write out your logical steps. 这是一种写出逻辑步骤的方法。 Note the use of idx to index the vectors. 注意使用idx来索引向量。 That was necessary to avoid out-of-range indexes. 这对于避免超出范围的索引是必要的。

idx <- seq_len(nrow(df))

# Set a default value for the new variable
df$veep <- NA

Your first set of logical criteria cannot be applied to the first row of df , since the index n - 1 would be 0 , and this is not a valid row index. 您的第一组逻辑条件不能应用于df的第一行,因为索引n - 1将为0 ,并且这不是有效的行索引。 So, use tail(*, -1) to pick out all but the first entries of veep and city and use head(*, -1) to pick out all but the last of ceep and city . 因此,使用tail(*, -1)来挑选除了veepcity的第一个条目之外的所有条目并使用head(*, -1)来挑选除了ceepcity之外的所有ceep

df[tail(idx, -1), "veep"] <- ifelse(
  head(df$ceep, -1) == 1 &
  tail(df$city, -1) == head(df$city, -1),
  1, tail(df$veep, -1))

Your next set of criteria cannot be applied to the last three rows of df , since n + 3 would then be an invalid index. 您的下一组标准不能应用于df的最后三行,因为n + 3将是无效索引。 So use the head and tail functions again. 所以再次使用headtail功能。 One tricky part is the fact that the first ceep statement is based on n + 2 , not n + 3 , so that a combination of head and tail is required. 一个棘手的部分是第一个ceep语句基于n + 2而不是n + 3 ,因此需要headtail的组合。

df[head(idx, -3), "veep"] <- ifelse(
  head(tail(df$ceep, -2), -1) == 1 &
  tail(df$ceep, -3) == 1 &
  head(df$city, -3) == tail(df$city, -3),
  1, head(df$veep, -3))

> df$veep
 [1] NA  1  1 NA  1 NA NA  1  1 NA NA  1 NA  1 NA

You can use a for loop like this 您可以像这样使用for循环

df$veep <- 0   

for (i in seq(nrow(df))){
 if (i > 1 & i < nrow(df)-2){
    if (df[i-1,"ceep"]==1 & df[i-1,"city"] == df[i,"city"])
       df[i,"veep"] <- 1
 }
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 生成新列作为第 n 个和第 n-1 个元素的差异 - Generate new column as difference of nth and n-1th elements R - 创建一个新变量,其中每个观察值取决于另一个表和数据框中的其他变量 - R - Create a new variable where each observation depends on another table and other variables in the data frame 从不同列中的另一个观察值中减去一个观察值,并将特定值添加到结果中以在 R 的第一列中创建一个新观察值 - Subtract an observation from another in different column and add a specific value to the result to create a new observation in the first column in R 根据另一列的先前观察值创建新变量 - Create new variable based on prior observation value from another column R:如何在数据框中创建一个新列,其中主要计算观察值对变量具有相同值的次数 - R: how to create a new column in a dataframe where is cardinally counted how many times an observation has the same value for a variable R 中的行到列观察 - Observation in Row to Column in R 标记组的第一个(或第 n 个)观察 - Tag first (or n-th) observation of group R中,如何根据上一年的观测值做一个新变量,如果去年没有观测值,就把它设为NA - How to make a new variable based on the observation from the previous year, and make it NA if there is no observation in the last year in R R - 对于列中的每个观察值,在另一列中找到最接近的观察值 - R - for each observation in a column, find the closest one in another column 如何创建一个列,指示观察结果与R中另一个观测值的滞后? - How can I create a column that indicates the observation's lag from another observation in R?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM