I have a dataframe with the following details in it.
BatchId Datetime Purchase_Status Current_Progress
PRT-10011 2021-03-01 15:18:24 Sold Pending
PRT-10012 2021-03-12 18:11:04 Sold
PRT-10013 2021-03-15 21:13:45 Open
PRT-10014 Open
PRT-10015 2021-03-18 10:06:36 Return Pending
PRT-10016 Process Pending
Dput(df)
structure(list(BatchId = c("PRT-10011", " PRT-10012",
" PRT-10013", " PRT-10014", " PRT-10015", " PRT-10016"
), Datetime = c("2019-05-20 10:46:49", "2020-09-24 12:28:10", "2019-05-31 06:12:12",
NA, "2019-09-26 11:36:58", NA
), Purchase_Status = c("Sold", "Sold", "Open",
"Open", "Return", "Process"), Current_Progress = c("Pending",
NA, NA, NA, "Pending",
"Pending")), row.names = c(12426L, 21988L, 22555L,
12486L, 15432L, 16934L), class = "data.frame")
I need to add one more column as Category
with the following conditon.
Purchase_Status
is Sold and Current_Progress
is not blank, Na or null then concatenate the Purchase_Status value and Current_Progress Value by "-"Purchase_Status
is Sold and Current_Progress
is blank, Na or null then concatenate the Purchase_Status value with the text "Not Updated" by "-"Purchase_Status
is Open and Datetime is not blank, Na or null then concatenate the Purchase_Status value with the text "Order Placed" by "-"Purchase_Status
is Open and Datetime is blank, Na or null then concatenate the Purchase_Status value with the text "Order Not Placed" by "-"Purchase_Status
other than "Sold" & "Open" put it as Other and concatenate with the text "Order Not Placed" or "Oder Placed" based on the availability of value in Datetime columnOutput df
BatchId Datetime Purchase_Status Current_Progress Category
PRT-10011 2021-03-01 15:18:24 Sold Pending Sold - Pending
PRT-10012 2021-03-12 18:11:04 Sold Sold - Not Updated
PRT-10013 2021-03-15 21:13:45 Open Open - Order Placed
PRT-10014 Open Open - Order Not Placed
PRT-10015 2021-03-18 10:06:36 Return Pending Other - Order Placed
PRT-10016 Process Pending Other - Order Not Placed
df %>%
replace_na(list(Current_Progress = "")) %>% # simplifies below to test for just ""
# instead of "" and NA
mutate(Category = case_when(
Purchase_Status == "Sold" & Current_Progress != "" ~ paste0(Purchase_Status, "-", Current_Progress),
Purchase_Status == "Sold" ~ paste0(Purchase_Status, "-Not Updated"),
Purchase_Status == "Open" & Current_Progress != "" ~ paste0(Purchase_Status, "-Order Placed"),
Purchase_Status == "Open" ~ paste0(Purchase_Status, "-Order Not Placed"),
is.na(Datetime) ~ "Order Not Placed",
TRUE ~ "Order Placed")
)
dplyr::case_when
tests each condition in order, so if none of the prior cases match, the last step doesn't need a test -- we can just take it as TRUE.
BatchId Datetime Purchase_Status Current_Progress Category
12426 PRT-10011 2019-05-20 10:46:49 Sold Pending Sold-Pending
21988 PRT-10012 2020-09-24 12:28:10 Sold Sold-Not Updated
22555 PRT-10013 2019-05-31 06:12:12 Open Open-Order Not Placed
12486 PRT-10014 <NA> Open Open-Order Not Placed
15432 PRT-10015 2019-09-26 11:36:58 Return Pending Order Placed
16934 PRT-10016 <NA> Process Pending Order Not Placed
Like the comments say you should be able to use dplyr::case_when
to do this. Your call should look something like
df %>%
dplyr::mutate(Category = dplyr::case_when(
Purchase_Status == "Sold" & !is.na(Current_Progess) ~ paste(Purchase_Status, Current_Progess, sep = "-"),
# OTHER CASES HERE)
)
adding your other cases and mapping them to a value with ~
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.