简体   繁体   中英

Nested if-else loop in R with multiple conditions

I need to write a nested loop to go through IDs annually to compare multiple variables from dataframes D1 and D2 on an if-else condition.

D1:

ID    year         X1      
 1    2000      34563     
 1    2001      34563     
 1    2002      12367     
 2    2010      14363     
 2    2011      14363     
 2    2012      13312     
 2    2013      13312     
 2    2014      13312     

D2:

year       X1      X2      
2001    34563   12367  
2011    14363   13312  
 

I created X2 in D1 (X2 is the following year's X1 in D1) by duplicating column X1 and shifting it up by 1 row (this is a rough aproach as well since if for an ID and year there is no data for the following year X2 should be filled as NA, instead of X1 for the next ID in the dataframe.)

For an ID in D1, I need to loop through each year for that ID, and for a year N, if

  1. D1$X1 == D2$X1
  2. D1$X2 == D2$X2

D1$G = 1 else D1$G = 0.

If there is no data for year N+1, condition 2 is ignored.

Now I want to compare each row in D1 directly with D2. I tried an if-else statement as follows

D1$G <- ifelse(D1$X1 == D2$X1 & D1$X2 == D2$X2 & D1$year == D2$year, "1", "0")

This is what I'm ending up with, however

  ID   year      X1      X2    G
1  1   2000   34563   34563    0
2  1   2001   34563   12367    0
3  1   2002   12367   14363    0
4  2   2010   14363   14363    0
5  2   2011   14363   13312    0
6  2   2012   13312   13312    0
7  2   2013   13312   13312    0
8  2   2014   13312      NA    0

Instead of

  ID   year      X1      X2    G
1  1   2000   34563   34563    0
2  1   2001   34563   12367    1
3  1   2002   12367   14363    0
4  2   2010   14363   14363    0
5  2   2011   14363   13312    1
6  2   2012   13312   13312    0
7  2   2013   13312   13312    0
8  2   2014   13312      NA    0

Want to understand where I'm going wrong (or if there are simpler methods). Any help is appreciated.

Reproducible code:

D1 <- data.frame(ID = c(1, 1, 1, 2, 2, 2, 2, 2),
                 year = c(2000, 2001, 2002, 2010, 2011, 2012, 2013, 2014),
                 X1 = c(34563, 34563, 12367, 14363, 14363, 13312, 13312, 13312)
)
D2 <- data.frame(year = c(2001, 2011),
                 X1 = c(34563, 14363),
                 X2 = c(12367, 13312)
)

# creating X2 in D1
D1$X2 = D1$X1
D1$X2 <- shift(D1$X1, 1)

Maybe this might be helpful. Add a G column to D2 of 1. Then, you can merge the two data.frames, and replace NA where there was no match with 0.

library(tidyverse)

D2$G <- 1

D1 %>%
  group_by(ID) %>%
  mutate(X2 = lead(X1, 1)) %>%
  left_join(D2, by = c("year", "X1", "X2")) %>%
  replace_na(list(G = 0))

Output

     ID  year    X1    X2     G
  <dbl> <dbl> <dbl> <dbl> <dbl>
1     1  2000 34563 34563     0
2     1  2001 34563 12367     1
3     1  2002 12367    NA     0
4     2  2010 14363 14363     0
5     2  2011 14363 13312     1
6     2  2012 13312 13312     0
7     2  2013 13312 13312     0
8     2  2014 13312    NA     0

Edit : To explain the problem with the ifelse statement, you are comparing two vectors of different lengths, in a way likely not intended.

Consider two vectors from your data.frames:

year1 = c(2000, 2001, 2002, 2010, 2011, 2012, 2013, 2014)
year2 = c(2001, 2011)

If you compare using == operator:

year1 == year2

You will get all FALSE :

[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

This is essentially comparing in order 2000 with 2001, 2001 with 2011, 2002 with 2001 (again, recycling vector year2 given shorter length), 2010 with 2011, 2011 with 2001 (again), etc.

Another way to compare the two vectors is using %in% :

year1 %in% year2

[1] FALSE  TRUE FALSE FALSE  TRUE FALSE FALSE FALSE

This will give you logical results based on each value in year1 contained in the vector year2 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM