[英]Delete rows containing specific words with additional conditions in R
我想摆脱关键字中包含“计划”一词的行,除非还包括“广告”或“营销”。 特别是在样本数据集中,应删除包含“ hr plan”和“ operation plan”的关键字行。
keyword <- c("advertising plan",
"advertising budget",
"marketing plan",
"marketing budget",
"hr plan",
"hr budget",
"operation plan",
"operation budget")
indicator <- c(1,0,0,1,1,1,0,1)
sample <- cbind(keyword,indicator)
不使用花哨的正则表达式,我可能只想结合两个规则:
sample[!(grepl("plan", sample[,"keyword"]) &
(!grepl("marketing|advertising", sample[,"keyword"]))),]
# keyword indicator
#[1,] "advertising plan" "1"
#[2,] "advertising budget" "0"
#[3,] "marketing plan" "0"
#[4,] "marketing budget" "1"
#[5,] "hr budget" "1"
#[6,] "operation budget" "1"
这是使用正则表达式和stringr
包的可能解决方案。 如评论中所述,我将indicator
扩展了2个值。 基本上,您想使用正则表达式检测哪些keyword
没有“ plan”,或者以“ advertising”或“ marketing”开头。 hth
library("tidyverse")
library("stringr")
keyword <- c("advertising plan",
"advertising budget",
"marketing plan",
"marketing budget",
"hr plan",
"hr budget",
"operation plan",
"operation budget")
indicator <- c(1,0,1,0,0,1,1,1)
df <- data_frame(keyword,indicator)
df %>%
filter(!keyword %>% stringr::str_detect("plan") |
keyword %>% stringr::str_detect(pattern = c("^advertising|marketing")))
# A tibble: 6 × 2
keyword indicator
<chr> <dbl>
1 advertising plan 1
2 advertising budget 0
3 marketing plan 1
4 marketing budget 0
5 hr budget 1
6 operation budget 1
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.