简体   繁体   English

R:根据另一列的文本条件更新列

[英]R: Update Column Based on Text Condition from Another Column

I would like to make a new column in my data frame by using a conditional statement that would say "If Column_y contains Column_x then 1 else 0"我想通过使用条件语句在我的数据框中创建一个新列,该语句会说“如果 Column_y 包含 Column_x 然后 1 else 0”

For example:例如:

Event   Name     Winner       Loser          New Column
1       James    James,Bob    John,Steve     1
1       Bob      James,Bob    John,Steve     1
1       John     James,Bob    John,Steve     0
1       Steve    James,Bob    John,Steve     0

I want to have New Column<- "If Winner contains Name then 1 else 0"我想要新列<-“如果获胜者包含名称,则为 1,否则为 0”

Keep in mind this is for 100,000 rows and probably 700 unique names.请记住,这适用于 100,000 行,可能有 700 个唯一名称。 When I try things like当我尝试像

df$NewColumn<-ifelse(grepl(df$Name,df$Winner)==TRUE,1,0) 

or variations I get the "pattern has a length > 1" error.或变体我得到“模式的长度 > 1”错误。

I think you just want to compare the Name column against the Winner column:我认为您只想将Name列与Winner列进行比较:

df$NewColumn <- ifelse(df$Name == df$Winner, 1, 0)

Note that because df$Name == df$Winner is actually a boolean expression, you might also be able to simplify to:请注意,因为df$Name == df$Winner实际上是 boolean 表达式,您也可以简化为:

df$NewColumn <- df$Name == df$Winner

In your example, exact string matching works.在您的示例中,精确的字符串匹配有效。 But I am assuming it does not hold true for your entire data.但我假设它不适用于您的整个数据。

Implementing the contains condition would be something like this:实现包含条件将是这样的:


library(dplyr)
library(purrr)

df = df %>% 
  dplyr::mutate(NewColumn = purrr::map2_dbl(.x=Winner,.y=Name,~ifelse(grepl(.y,.x),1,0)))

Adding an alternate solution with stringr :使用stringr添加替代解决方案:

df = df %>% 
  dplyr::mutate(NewColumn=ifelse(str_detect(Winner,Name),1,0))

Let me know if this works.让我知道这个是否奏效。

PS: str_detect is faster. PS: str_detect更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM