简体   繁体   English

在 R 中查找和替换文本

[英]Finding and replacing text in R

Recently, I have started to learn R and trying to explore more by automating the process.最近,我开始学习 R 并尝试通过自动化过程来探索更多。 Below is the sample data and I'm trying to create a new column by finding and replacing the particular text within the label (colname:Designations).下面是示例数据,我正在尝试通过查找和替换标签中的特定文本来创建一个新列 (colname:Designations)。

Since, I'm getting this work with loads of new data I would like to automate using R programming than using excel formulas.因为,我正在使用大量新数据进行这项工作,我想使用 R 编程而不是使用 excel 公式来自动化。

Dataset:数据集:

strings<-c("Zonal Manager","Department Manager","Network Manager","Head of Sales","Account Manager","Alliance Manager","Additional Manager","Senior Vice President","General manager","Senior Analyst", "Solution Architect","AGM")

R code i used:我使用的 R 代码:

t<-data.frame(strings,stringsAsFactors = FALSE)
colnames(t)[1]<-"Designations"
y<-sub(".*Manager*","Manager",strings,ignore.case = TRUE)

Challenge:挑战:
In this all the data got changed as Manager but I needed to replace other designations with the main themes.在此,所有数据都作为经理进行了更改,但我需要用主要主题替换其他名称。

I tried with ifelse statement, grep, grepl, str,sub, etc but I didn't get what I'm looking for我尝试使用 ifelse 语句、grep、grepl、str、sub 等,但没有得到我想要的

I can't use first/second/last words (as'delimit') since the main themes scatters to and fro.. Eg: Chief Information Officer or Commercial Finance Manager or AGM我不能使用第一个/第二个/最后一个词(作为“分隔”),因为主要主题来回分散。例如:首席信息官或商业财务经理或 AGM

Excel Work:优秀作品:
I have already coded 300 main themes as...我已经将 300 个主要主题编码为...

Manager (for all GM, Asst.Manager,Sales Manager,etc) Architect (Solution Arch, Sr. Arch, etc) Director (Senior Director, Director, Asst.Director, etc) Senior Analyst Analyst Head (for head of sales)经理(适用于所有 GM、Asst.Manager、销售经理等) 架构师(Solution Arch、Sr. Arch 等) Director(Senior Director、Director、Asst.Director 等) 高级分析师 分析师 Head(针对销售负责人)

What I'm looking for: I needed to create a new column and should replace the text with the relevant main themes as I did in excel using R.我在寻找什么:我需要创建一个新列,并且应该用相关的主题替换文本,就像我在 excel 中使用 R 所做的那样。

I'm ok if i can take the main themes that I have already coded in excel to match the themes using R programming (as vlookup in excel).如果我可以使用我已经在 excel 中编码的主题来匹配使用 R 编程的主题(如 excel 中的 vlookup),我就可以了。

Expected result: enter image description here Thanks in advance for your help!预期结果:在此处输入图像描述 在此先感谢您的帮助!

Yes, exactly the same thing I'm expeccting.是的,和我期待的完全一样。 Thanks!!谢谢!! But when I tried the same methodology by uploading the new dataset (excel file) and with但是当我通过上传新数据集(excel文件)并使用相同的方法尝试相同的方法时

df %>% 
   mutate(theme=gsub(".*(Manager|Lead|Director|Head|Administrator|Executive|Executive|VP|President|Consultant|CFO|CTO|CEO|CMO|CDO|CIO|COO|Cheif Executive Officer|Chief Technological Officer|Chief Digital Officer|Chief Financial Officer|Chief Marketing Officer|Chief Digital Officer|Chief Information Officer,Chief Operations Officer)).*","\\1",Designations,ignore.case = TRUE))

it didn't work.它没有用。 Should I correct somewhere else.?我应该在其他地方更正吗?

data:数据:

strings<-c("Zonal Manager","Department Manager","Network Manager","Head of Sales","Account Manager",
           "Alliance Manager","Additional Manager","Senior Vice President","General manager","Senior Analyst", "Solution Architect","AGM")

you need to prepare a good look up table: (you complete it and make it perfect.)你需要准备一个好的查找表:(你完成它并使它完美。)

lu_table <- data.frame(new = c("Manager", "Architect","Director"), old = c("Manager|GM","Architect|Arch","Director"), stringsAsFactors = F)

Then you can let mapply do the job:然后你可以让 mapply 来完成这项工作:

mapply(function(new,old) {ans <- strings; ans[grepl(old,ans)]<-new; strings <<- ans; return(NULL)}, new = lu_table$new, old = lu_table$old)

now look at strings :现在看看strings

> strings
 [1] "Manager"               "Manager"               "Manager"               "Head of Sales"         "Manager"               "Manager"              
 [7] "Manager"               "Senior Vice President" "General manager"       "Senior Analyst"        "Architect"             "Manager" 

please note:请注意:

This solution uses <<- .此解决方案使用<<- So this might not be the best possible solution.所以这可能不是最好的解决方案。 But works in this case.但在这种情况下有效。

Do you mean something like this?你的意思是这样的吗?

library(dplyr)
strings <-
  c(
    "Zonal Manager",
    "Department Manager",
    "Network Manager",
    "Head of Sales",
    "Account Manager",
    "Alliance Manager",
    "Additional Manager",
    "Senior Vice President",
    "General manager",
    "Senior Analyst",
    "Solution Architect",
    "AGM"
  )

df = data.frame(Designations = strings)


df %>%
  mutate(
    theme = gsub(
      ".*(manager|head|analyst|architect|agm|director|president).*",
      "\\1",
      Designations,
      ignore.case = TRUE
    )
  )
#>             Designations     theme
#> 1          Zonal Manager   Manager
#> 2     Department Manager   Manager
#> 3        Network Manager   Manager
#> 4          Head of Sales      Head
#> 5        Account Manager   Manager
#> 6       Alliance Manager   Manager
#> 7     Additional Manager   Manager
#> 8  Senior Vice President President
#> 9        General manager   manager
#> 10        Senior Analyst   Analyst
#> 11    Solution Architect Architect
#> 12                   AGM       AGM

Created on 2018-10-04 by the reprex package (v0.2.1)reprex 包(v0.2.1) 于 2018 年 10 月 4 日创建

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM