简体   繁体   中英

Create new variable that is 0 until the first non-NA value of another variable, then 1 thereafter (within a group)

I have the following df:

df <- tibble(country = c("US", "US", "US", "US", "US", "US", "US", "US", "US", "Mex", "Mex"),
         year = c(1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2000, 2001),
         score = c(NA, NA, NA, NA, 426, NA, NA, 430, NA, 450, NA))

What I'd like to do: create a new variable before_after that is 0 until the first year that a country has a non-NA value for score and then is a 1 thereafter.

In other words, hard coding it, I'd like it to return the following df:

df <- tibble(country = c("US", "US", "US", "US", "US", "US", "US", "US", "US", "Mex", "Mex"),
         year = c(1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2000, 2001),
         score = c(NA, NA, NA, NA, 426, NA, NA, 430, NA, 450, NA),
         before_after = c(0,0,0,0,1,1,1,1,1,1,1))

I tried the following code, but to no avail:

df %>% 
arrange(year) %>% 
group_by(country) %>% 
mutate(before_after = ifelse(which.max(!is.na(score)),1,0)) %>% 
arrange(country, year)

Tidyverse solutions would be much appreciated, but truly any help will be immensely appreciated.

Thanks in advance!

You can use cumsum

df %>%
  arrange(country, year) %>%
  group_by(country) %>%
  mutate(before_after = ifelse(cumsum(!is.na(score)) > 0, 1, 0)) 

   country  year score before_after
   <chr>   <dbl> <dbl>        <dbl>
 1 Mex      2000   450            1
 2 Mex      2001    NA            1
 3 US       1999    NA            0
 4 US       2000    NA            0
 5 US       2001    NA            0
 6 US       2002    NA            0
 7 US       2003   426            1
 8 US       2004    NA            1
 9 US       2005    NA            1
10 US       2006   430            1
11 US       2007    NA            1

Use group_by in combination with fill :

library(tidyverse)

# create dataframe
df <- tibble(country = c("US", "US", "US", "US", "US", "US", "US", "US", "US", "Mex", "Mex"),
             year = c(1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2000, 2001),
             score = c(NA, NA, NA, NA, 426, NA, NA, 430, NA, 450, NA))

# create before_after variable with case_when
(df <- mutate(df, before_after = case_when(!is.na(score) ~ 1)))
# A tibble: 11 x 4
   country  year score before_after
   <chr>   <dbl> <dbl>        <dbl>
 1 Mex      2000   450            1
 2 Mex      2001    NA           NA
 3 US       1999    NA           NA
 4 US       2000    NA           NA
 5 US       2001    NA           NA

# run fill
df %>%
  group_by(country) %>%
  fill(before_after)
# A tibble: 11 x 4
# Groups:   country [2]
   country  year score before_after
   <chr>   <dbl> <dbl>        <dbl>
 1 Mex      2000   450            1
 2 Mex      2001    NA            1
 3 US       1999    NA           NA
 4 US       2000    NA           NA
 5 US       2001    NA           NA

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM