簡體   English   中英

如何根據 R 中的其他列對特定列中的值進行分類

[英]How to categorize value in particular column based on other column in R

我有一個 dataframe ,其中包含以下詳細信息。

BatchId      Datetime              Purchase_Status        Current_Progress
PRT-10011    2021-03-01 15:18:24   Sold                   Pending
PRT-10012    2021-03-12 18:11:04   Sold                   
PRT-10013    2021-03-15 21:13:45   Open                   
PRT-10014                          Open                   
PRT-10015    2021-03-18 10:06:36   Return                 Pending
PRT-10016                          Process                Pending

輸出(df)

structure(list(BatchId = c("PRT-10011", " PRT-10012", 
" PRT-10013", " PRT-10014", " PRT-10015", " PRT-10016"
), Datetime = c("2019-05-20 10:46:49", "2020-09-24 12:28:10", "2019-05-31 06:12:12",
NA, "2019-09-26 11:36:58", NA
), Purchase_Status = c("Sold", "Sold", "Open", 
"Open", "Return", "Process"), Current_Progress = c("Pending", 
NA, NA, NA, "Pending", 
"Pending")), row.names = c(12426L, 21988L, 22555L, 
12486L, 15432L, 16934L), class = "data.frame")

我需要再添加一列作為具有以下條件的Category

  • 如果Purchase_Status為 Sold 且Current_Progress不為空,Na 或 null 然后將 Purchase_Status 值和 Current_Progress 值通過“-”連接起來
  • 如果Purchase_Status為 Sold 且Current_Progress為空白,Na 或 null 然后將 Purchase_Status 值與文本“未更新”通過“-”連接起來
  • 如果Purchase_Status為 Open 且 Datetime 不為空,Na 或 null 然后將 Purchase_Status 值與文本“Order Placed”通過“-”連接起來
  • 如果Purchase_Status為 Open 並且 Datetime 為空,Na 或 null 然后將 Purchase_Status 值與文本“未放置訂單”通過“-”連接起來
  • 對於“Sold”和“Open”以外的Purchase_Status的 Rest 將其設置為 Other 並根據 Datetime 列中值的可用性與文本“Order Not Placed”或“Oder Placed”連接

Output df

BatchId      Datetime              Purchase_Status        Current_Progress     Category
PRT-10011    2021-03-01 15:18:24   Sold                   Pending              Sold - Pending
PRT-10012    2021-03-12 18:11:04   Sold                                        Sold - Not Updated
PRT-10013    2021-03-15 21:13:45   Open                                        Open - Order Placed
PRT-10014                          Open                                        Open - Order Not Placed
PRT-10015    2021-03-18 10:06:36   Return                 Pending              Other - Order Placed
PRT-10016                          Process                Pending              Other - Order Not Placed
df %>%
  replace_na(list(Current_Progress = "")) %>%  # simplifies below to test for just "" 
                                               # instead of "" and NA
  mutate(Category = case_when(
    Purchase_Status == "Sold" & Current_Progress != "" ~ paste0(Purchase_Status, "-", Current_Progress),
    Purchase_Status == "Sold" ~ paste0(Purchase_Status, "-Not Updated"),
    Purchase_Status == "Open" & Current_Progress != "" ~ paste0(Purchase_Status, "-Order Placed"),
    Purchase_Status == "Open" ~ paste0(Purchase_Status, "-Order Not Placed"),
    is.na(Datetime) ~ "Order Not Placed",
    TRUE ~ "Order Placed")
  )

dplyr::case_when按順序測試每個條件,因此如果前面的案例都不匹配,則最后一步不需要測試——我們可以將其視為 TRUE。

         BatchId            Datetime Purchase_Status Current_Progress              Category
12426  PRT-10011 2019-05-20 10:46:49            Sold          Pending          Sold-Pending
21988  PRT-10012 2020-09-24 12:28:10            Sold                       Sold-Not Updated
22555  PRT-10013 2019-05-31 06:12:12            Open                  Open-Order Not Placed
12486  PRT-10014                <NA>            Open                  Open-Order Not Placed
15432  PRT-10015 2019-09-26 11:36:58          Return          Pending          Order Placed
16934  PRT-10016                <NA>         Process          Pending      Order Not Placed

就像評論說你應該能夠使用dplyr::case_when來做到這一點。 你的電話應該看起來像

df %>%
  dplyr::mutate(Category = dplyr::case_when(
    Purchase_Status == "Sold" & !is.na(Current_Progess) ~ paste(Purchase_Status, Current_Progess, sep = "-"),
    # OTHER CASES HERE)
)

添加您的其他案例並使用~將它們映射到一個值。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM