簡體   English   中英

如何使用 dplyr 創建基於另一個值的列,而不必寫下每個值?

[英]How do I create a column based on values of another using dplyr without having to write down every value?

有沒有辦法更有效地做到這一點? 我想創建一個項目類型的列。 每個參與者都有不同數量的項目,所以這真的很棘手。 這是我的數據的玩具示例

structure(list(id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L), condition = c("high", "high", "high", "high", "high", 
"high", "high", "high", "medium", "medium", "medium", "medium", 
"medium", "medium", "medium", "low", "low", "low", "low", "low", 
"low", "low", "low", "low", "low", "low", "low", "low", "low", 
"low", "high", "high", "high", "high", "high", "high", "high", 
"medium", "medium", "medium", "medium", "medium", "medium", "medium"
), item = c("abcde", "bcdef", "cdefgh", "defgh", "efghi", "fghijk", 
"ghijkl", "hijklm", "1234", "2345", "3456", "4567", "5678", "6789", 
"7890", "onion", "celery", "tomato", "carrot", "green bean", 
"lettuce", "garlic", "abcde", "bcdef", "cdefgh", "defgh", "efghi", 
"fghijk", "ghijkl", "hijklm", "onion", "celery", "tomato", "carrot", 
"green bean", "lettuce", "garlic", "1234", "2345", "3456", "4567", 
"5678", "6789", "7890")), row.names = c(NA, -44L), class = c("tbl_df", 
"tbl", "data.frame"))

這是我到目前為止所做的,但這是一場噩夢,因為我有一百多個不同的項目:

df$subs <- 0
df$subs[df$item=="abcde"] <- "A"
df$subs[df$item=="bcdef"] <- "A"
df$subs[df$item=="cdefg"] <- "A"
df$subs[df$item=="defgh"] <- "A"
df$subs[df$item=="efghi"] <- "A"

df$subs[df$item=="12345"] <- "B"
df$subs[df$item=="23456"] <- "B"
df$subs[df$item=="34567"] <- "B"
df$subs[df$item=="45678"] <- "B"
df$subs[df$item=="56789"] <- "B"

df$subs[df$item=="onion"] <- "C"
df$subs[df$item=="celery"] <- "C"
df$subs[df$item=="tomato"] <- "C"
df$subs[df$item=="carrot"] <- "C"
df$subs[df$item=="green bean"] <- "C"

使用 tidyverse 有更快的方法嗎?

我認為最簡單的方法是使用正則表達式來簡化這一點。 您可以使用 tidyverse,但這不是絕對必要的。 這是一個 tidyverse 示例:

library(tidyverse)
df %>% 
  mutate(subs = case_when(
    str_detect(item, "[a-m]{5}") ~ "A",
    str_detect(item, pattern = "\\d+") ~ "B",
                          TRUE ~ "C"))

這里的關鍵是您選擇的正則表達式模式需要對您的真實數據精確。 我認為這個版本適用於您包含的簡單示例。 另外,請注意dput與呈現的數據不同 - 因此我選擇在第一個正則表達式模式中使用[am]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM