简体   繁体   中英

Generating a new variable in R where the nth observation depends on the n-1th observation of another column

Suppose I have a data frame that look something like this:

>df
city  year  ceep
  1    1      1
  1    2      1
  1    3      0
  1    4      1
  1    5      0
  2    1      0
  2    2      1
  2    3      1
  2    4      0
  2    5      1
  3    1      1
  3    2      0
  3    3      1
  3    4      0
  3    5      1

Now I want to create a new variable 'veep' that depends on the values of 'city' and 'ceep' from different rows. For instance,

veep=1 if ceep[_n-1]=1 & city=city[_n-1]
veep=1 if ceep[_n+2]=1 & ceep[_n+3]=1 & city=city[_n+3] 

where n is the row of observation. I'm not sure how to translate these conditions into R language. I guess where I'm having trouble is choosing the row of observation. I'm thinking of a code somewhere along the lines of:

df$veep[df$ceep(of the n-1th observation)==1 & city==city(n-1th observ.)] <- 1
df$veep[df$ceep(of the n+2th observation)==1 & df$ceep(of the n+3th observation)==1 &
city==city(n+3th observ.)] <- 1

#note: what's in parentheses is just to demonstrate where I'm having trouble 

Can anyone provide help on this?

Here's a way to write out your logical steps. Note the use of idx to index the vectors. That was necessary to avoid out-of-range indexes.

idx <- seq_len(nrow(df))

# Set a default value for the new variable
df$veep <- NA

Your first set of logical criteria cannot be applied to the first row of df , since the index n - 1 would be 0 , and this is not a valid row index. So, use tail(*, -1) to pick out all but the first entries of veep and city and use head(*, -1) to pick out all but the last of ceep and city .

df[tail(idx, -1), "veep"] <- ifelse(
  head(df$ceep, -1) == 1 &
  tail(df$city, -1) == head(df$city, -1),
  1, tail(df$veep, -1))

Your next set of criteria cannot be applied to the last three rows of df , since n + 3 would then be an invalid index. So use the head and tail functions again. One tricky part is the fact that the first ceep statement is based on n + 2 , not n + 3 , so that a combination of head and tail is required.

df[head(idx, -3), "veep"] <- ifelse(
  head(tail(df$ceep, -2), -1) == 1 &
  tail(df$ceep, -3) == 1 &
  head(df$city, -3) == tail(df$city, -3),
  1, head(df$veep, -3))

> df$veep
 [1] NA  1  1 NA  1 NA NA  1  1 NA NA  1 NA  1 NA

You can use a for loop like this

df$veep <- 0   

for (i in seq(nrow(df))){
 if (i > 1 & i < nrow(df)-2){
    if (df[i-1,"ceep"]==1 & df[i-1,"city"] == df[i,"city"])
       df[i,"veep"] <- 1
 }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM