[英]How to categorize value in particular column based on other column in R
我有一個 dataframe ,其中包含以下詳細信息。
BatchId Datetime Purchase_Status Current_Progress
PRT-10011 2021-03-01 15:18:24 Sold Pending
PRT-10012 2021-03-12 18:11:04 Sold
PRT-10013 2021-03-15 21:13:45 Open
PRT-10014 Open
PRT-10015 2021-03-18 10:06:36 Return Pending
PRT-10016 Process Pending
輸出(df)
structure(list(BatchId = c("PRT-10011", " PRT-10012",
" PRT-10013", " PRT-10014", " PRT-10015", " PRT-10016"
), Datetime = c("2019-05-20 10:46:49", "2020-09-24 12:28:10", "2019-05-31 06:12:12",
NA, "2019-09-26 11:36:58", NA
), Purchase_Status = c("Sold", "Sold", "Open",
"Open", "Return", "Process"), Current_Progress = c("Pending",
NA, NA, NA, "Pending",
"Pending")), row.names = c(12426L, 21988L, 22555L,
12486L, 15432L, 16934L), class = "data.frame")
我需要再添加一列作為具有以下條件的Category
。
Purchase_Status
為 Sold 且Current_Progress
不為空,Na 或 null 然后將 Purchase_Status 值和 Current_Progress 值通過“-”連接起來Purchase_Status
為 Sold 且Current_Progress
為空白,Na 或 null 然后將 Purchase_Status 值與文本“未更新”通過“-”連接起來Purchase_Status
為 Open 且 Datetime 不為空,Na 或 null 然后將 Purchase_Status 值與文本“Order Placed”通過“-”連接起來Purchase_Status
為 Open 並且 Datetime 為空,Na 或 null 然后將 Purchase_Status 值與文本“未放置訂單”通過“-”連接起來Purchase_Status
的 Rest 將其設置為 Other 並根據 Datetime 列中值的可用性與文本“Order Not Placed”或“Oder Placed”連接Output df
BatchId Datetime Purchase_Status Current_Progress Category
PRT-10011 2021-03-01 15:18:24 Sold Pending Sold - Pending
PRT-10012 2021-03-12 18:11:04 Sold Sold - Not Updated
PRT-10013 2021-03-15 21:13:45 Open Open - Order Placed
PRT-10014 Open Open - Order Not Placed
PRT-10015 2021-03-18 10:06:36 Return Pending Other - Order Placed
PRT-10016 Process Pending Other - Order Not Placed
df %>%
replace_na(list(Current_Progress = "")) %>% # simplifies below to test for just ""
# instead of "" and NA
mutate(Category = case_when(
Purchase_Status == "Sold" & Current_Progress != "" ~ paste0(Purchase_Status, "-", Current_Progress),
Purchase_Status == "Sold" ~ paste0(Purchase_Status, "-Not Updated"),
Purchase_Status == "Open" & Current_Progress != "" ~ paste0(Purchase_Status, "-Order Placed"),
Purchase_Status == "Open" ~ paste0(Purchase_Status, "-Order Not Placed"),
is.na(Datetime) ~ "Order Not Placed",
TRUE ~ "Order Placed")
)
dplyr::case_when
按順序測試每個條件,因此如果前面的案例都不匹配,則最后一步不需要測試——我們可以將其視為 TRUE。
BatchId Datetime Purchase_Status Current_Progress Category
12426 PRT-10011 2019-05-20 10:46:49 Sold Pending Sold-Pending
21988 PRT-10012 2020-09-24 12:28:10 Sold Sold-Not Updated
22555 PRT-10013 2019-05-31 06:12:12 Open Open-Order Not Placed
12486 PRT-10014 <NA> Open Open-Order Not Placed
15432 PRT-10015 2019-09-26 11:36:58 Return Pending Order Placed
16934 PRT-10016 <NA> Process Pending Order Not Placed
就像評論說你應該能夠使用dplyr::case_when
來做到這一點。 你的電話應該看起來像
df %>%
dplyr::mutate(Category = dplyr::case_when(
Purchase_Status == "Sold" & !is.na(Current_Progess) ~ paste(Purchase_Status, Current_Progess, sep = "-"),
# OTHER CASES HERE)
)
添加您的其他案例並使用~
將它們映射到一個值。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.