简体   繁体   中英

How to categorize value in particular column based on other column in R

I have a dataframe with the following details in it.

BatchId      Datetime              Purchase_Status        Current_Progress
PRT-10011    2021-03-01 15:18:24   Sold                   Pending
PRT-10012    2021-03-12 18:11:04   Sold                   
PRT-10013    2021-03-15 21:13:45   Open                   
PRT-10014                          Open                   
PRT-10015    2021-03-18 10:06:36   Return                 Pending
PRT-10016                          Process                Pending

Dput(df)

structure(list(BatchId = c("PRT-10011", " PRT-10012", 
" PRT-10013", " PRT-10014", " PRT-10015", " PRT-10016"
), Datetime = c("2019-05-20 10:46:49", "2020-09-24 12:28:10", "2019-05-31 06:12:12",
NA, "2019-09-26 11:36:58", NA
), Purchase_Status = c("Sold", "Sold", "Open", 
"Open", "Return", "Process"), Current_Progress = c("Pending", 
NA, NA, NA, "Pending", 
"Pending")), row.names = c(12426L, 21988L, 22555L, 
12486L, 15432L, 16934L), class = "data.frame")

I need to add one more column as Category with the following conditon.

  • If Purchase_Status is Sold and Current_Progress is not blank, Na or null then concatenate the Purchase_Status value and Current_Progress Value by "-"
  • If Purchase_Status is Sold and Current_Progress is blank, Na or null then concatenate the Purchase_Status value with the text "Not Updated" by "-"
  • If Purchase_Status is Open and Datetime is not blank, Na or null then concatenate the Purchase_Status value with the text "Order Placed" by "-"
  • If Purchase_Status is Open and Datetime is blank, Na or null then concatenate the Purchase_Status value with the text "Order Not Placed" by "-"
  • For Rest of the Purchase_Status other than "Sold" & "Open" put it as Other and concatenate with the text "Order Not Placed" or "Oder Placed" based on the availability of value in Datetime column

Output df

BatchId      Datetime              Purchase_Status        Current_Progress     Category
PRT-10011    2021-03-01 15:18:24   Sold                   Pending              Sold - Pending
PRT-10012    2021-03-12 18:11:04   Sold                                        Sold - Not Updated
PRT-10013    2021-03-15 21:13:45   Open                                        Open - Order Placed
PRT-10014                          Open                                        Open - Order Not Placed
PRT-10015    2021-03-18 10:06:36   Return                 Pending              Other - Order Placed
PRT-10016                          Process                Pending              Other - Order Not Placed
df %>%
  replace_na(list(Current_Progress = "")) %>%  # simplifies below to test for just "" 
                                               # instead of "" and NA
  mutate(Category = case_when(
    Purchase_Status == "Sold" & Current_Progress != "" ~ paste0(Purchase_Status, "-", Current_Progress),
    Purchase_Status == "Sold" ~ paste0(Purchase_Status, "-Not Updated"),
    Purchase_Status == "Open" & Current_Progress != "" ~ paste0(Purchase_Status, "-Order Placed"),
    Purchase_Status == "Open" ~ paste0(Purchase_Status, "-Order Not Placed"),
    is.na(Datetime) ~ "Order Not Placed",
    TRUE ~ "Order Placed")
  )

dplyr::case_when tests each condition in order, so if none of the prior cases match, the last step doesn't need a test -- we can just take it as TRUE.

         BatchId            Datetime Purchase_Status Current_Progress              Category
12426  PRT-10011 2019-05-20 10:46:49            Sold          Pending          Sold-Pending
21988  PRT-10012 2020-09-24 12:28:10            Sold                       Sold-Not Updated
22555  PRT-10013 2019-05-31 06:12:12            Open                  Open-Order Not Placed
12486  PRT-10014                <NA>            Open                  Open-Order Not Placed
15432  PRT-10015 2019-09-26 11:36:58          Return          Pending          Order Placed
16934  PRT-10016                <NA>         Process          Pending      Order Not Placed

Like the comments say you should be able to use dplyr::case_when to do this. Your call should look something like

df %>%
  dplyr::mutate(Category = dplyr::case_when(
    Purchase_Status == "Sold" & !is.na(Current_Progess) ~ paste(Purchase_Status, Current_Progess, sep = "-"),
    # OTHER CASES HERE)
)

adding your other cases and mapping them to a value with ~ .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM