简体   繁体   中英

How to create new column based on multiple conditions in R?

CUSTOMER LOAN   DATE    LOAN_DEFAULT CUSTOMER_DEFAULT 
1        101   201601        Y               Y            
1        102   201603        N               Y           
1        101   201501        Y               Y            
2        201   201601        N               N               
2        202   201603        N               N 

I have a dataset like above, there are multiple customers and each CUSTOMER may have several loans. A CUSTOMER with at least 1 LOAN_DEFAULT will be marked as CUSTOMER_DEFAULT.

I want to create a new column "DEFAULT_DATE" based on DATE and LOAN_DEFAULT at CUSTOMER Level, and the date is the earliest default date, like below:

CUSTOMER LOAN   DATE    LOAN_DEFAULT CUSTOMER_DEFAULT DEFAULT_DATE 
1        101   201601        Y               Y            201501
1        102   201603        N               Y            201501
1        103   201501        Y               Y            201501
2        201   201601        N               N               -
2        202   201603        N               N               -

How can I do this?

Assuming the DATE column is of format year-month, we can add an arbitrary date to convert DATE in Date class, arrange the data by CUSTOMER and DATE and get the data of first LOAN_DEFAULT for each CUSTOMER .

library(dplyr)

df %>%
  mutate(DATE = as.Date(paste0(DATE, '01'), '%Y%m%d')) %>%
  arrange(CUSTOMER, DATE) %>%
  group_by(CUSTOMER) %>%
  mutate(DEFAULT_DATE = DATE[which(LOAN_DEFAULT == 'Y')[1]])


#  CUSTOMER  LOAN DATE       LOAN_DEFAULT CUSTOMER_DEFAULT DEFAULT_DATE
#     <int> <int> <date>     <fct>        <fct>            <date>      
#1        1   101 2015-01-01 Y            Y                2015-01-01  
#2        1   101 2016-01-01 Y            Y                2015-01-01  
#3        1   102 2016-03-01 N            Y                2015-01-01  
#4        2   201 2016-01-01 N            N                NA          
#5        2   202 2016-03-01 N            N                NA       

Here is an option with data.table

library(data.table)
setDT(df)[, Date := as.IDate(paste0(DATE, '01'), '%Y%m%d'))
      ][order(CUSTOMER, DATE), DEFAULT_DATE := 
            DATE[match('Y', LOAN_DEFAULT)] , CUSTOMER][]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM