简体   繁体   中英

How to make a new variable based on the observation from the previous year, and make it NA if there is no observation in the last year in R

FIRM_ID YEAR FIRM_YEAR LOSS
1 2011 1_2011 0
1 2012 1_2012 1
1 2013 1_2013 1
1 2014 1_2014 1
2 2011 2_2011 1
2 2013 2_2013 0
2 2014 2_2014 1
3 2011 3_2011 0
3 2013 3_2013 1
3 2014 3_2014 0

Given the dataset above, I would like to create a new variable, using R, called PRIOR_LOSS, which is equal to 1 if the company had a loss (LOSS=1) in the previous year (for example for observation 1_2012, it should be 0). However, there is some missing data in this dataset. If the prior year is missing, it should report a missing value (NA or something of the sort) (so for example for observation 2_2013, it should report a missing value).

The following code already copies the value of the previous year, but if a year is missing, it just copies the year before that:

Data <- Data %>%
group_by(FIRM_ID) %>%
  mutate(PRIOR_LOSS= (lag(LOSS)))

We could use complete from tidyr to expand the data for the missing year and then get the lag of 'LOSS', and later remove those expanded rows

library(dplyr)
library(tidyr)
Data %>% 
  group_by(FIRM_ID) %>%
  complete(YEAR = min(YEAR):max(YEAR)) %>%
  mutate(PRIOR_LOSS = lag(LOSS)) %>% 
  ungroup %>% 
  filter(complete.cases(LOSS))

-output

# A tibble: 10 × 5
   FIRM_ID  YEAR FIRM_YEAR  LOSS PRIOR_LOSS
     <int> <int> <chr>     <int>      <int>
 1       1  2011 1_2011        0         NA
 2       1  2012 1_2012        1          0
 3       1  2013 1_2013        1          1
 4       1  2014 1_2014        1          1
 5       2  2011 2_2011        1         NA
 6       2  2013 2_2013        0         NA
 7       2  2014 2_2014        1          0
 8       3  2011 3_2011        0         NA
 9       3  2013 3_2013        1         NA
10       3  2014 3_2014        0          1

data

Data <- structure(list(FIRM_ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 
3L), YEAR = c(2011L, 2012L, 2013L, 2014L, 2011L, 2013L, 2014L, 
2011L, 2013L, 2014L), FIRM_YEAR = c("1_2011", "1_2012", "1_2013", 
"1_2014", "2_2011", "2_2013", "2_2014", "3_2011", "3_2013", "3_2014"
), LOSS = c(0L, 1L, 1L, 1L, 1L, 0L, 1L, 0L, 1L, 0L)),
 class = "data.frame", row.names = c(NA, 
-10L))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM