繁体   English   中英

删除包含特定单词的行,并在R中附加条件

[英]Delete rows containing specific words with additional conditions in R

我想摆脱关键字中包含“计划”一词的行,除非还包括“广告”或“营销”。 特别是在样本数据集中,应删除包含“ hr plan”和“ operation plan”的关键字行。

keyword <- c("advertising plan",
               "advertising budget",
               "marketing plan",
               "marketing budget",
               "hr plan",
               "hr budget",
               "operation plan",
               "operation budget")
indicator <- c(1,0,0,1,1,1,0,1)
sample <- cbind(keyword,indicator)

不使用花哨的正则表达式,我可能只想结合两个规则:

sample[!(grepl("plan", sample[,"keyword"]) &
        (!grepl("marketing|advertising", sample[,"keyword"]))),]
#     keyword              indicator
#[1,] "advertising plan"   "1"      
#[2,] "advertising budget" "0"      
#[3,] "marketing plan"     "0"      
#[4,] "marketing budget"   "1"      
#[5,] "hr budget"          "1"      
#[6,] "operation budget"   "1" 

这是使用正则表达式和stringr包的可能解决方案。 如评论中所述,我将indicator扩展了2个值。 基本上,您想使用正则表达式检测哪些keyword没有“ plan”,或者以“ advertising”或“ marketing”开头。 hth

library("tidyverse")
library("stringr")

keyword <- c("advertising plan",
             "advertising budget",
             "marketing plan",
             "marketing budget",
             "hr plan",
             "hr budget",
             "operation plan",
             "operation budget")

indicator <- c(1,0,1,0,0,1,1,1)

df <- data_frame(keyword,indicator)

    df %>% 
  filter(!keyword %>% stringr::str_detect("plan") | 
           keyword %>% stringr::str_detect(pattern = c("^advertising|marketing")))

# A tibble: 6 × 2
             keyword indicator
               <chr>     <dbl>
1   advertising plan         1
2 advertising budget         0
3     marketing plan         1
4   marketing budget         0
5          hr budget         1
6   operation budget         1

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM