简体   繁体   English

如何搜索多个字符串并将其替换为字符串列表中的任何内容

[英]How to search for multiple strings and replace them with nothing within a list of strings

I have a column in a dataframe like this: 我在数据框中有一个列,如下所示:

npt2$name
#  [1] "Andreas Groll, M.D."
#  [2] ""
#  [3] "Pan-Chyr Yang, PHD"
#  [4] "Suh-Fang Jeng, Sc.D"
#  [5] "Mostafa K Mohamed Fontanet Arnaud"
#  [6] "Thomas Jozefiak, M.D."
#  [7] "Medical Monitor"
#  [8] "Qi Zhu, MD"
#  [9] "Holly Posner"
# [10] "Peter S Sebel, MB BS, PhD Chantal Kerssens, PhD"
# [11] "Lance A Mynderse, M.D."
# [12] "Lawrence Currie, MD"

I tried gsub but with no luck. 我试过gsub但没有运气。 After doing toupper(x) I need to replace all instances of 'MD' or 'MD' or 'PHD' with nothing. 在做了toupper(x)我需要用什么都不替换'MD'或'MD'或'PHD'的所有实例。

Is there a nice short trick to do it? 有一个很好的简短技巧吗?

In fact I would be interested to see it done on a single string and how differently it is done in one command on the whole list. 事实上,我有兴趣看到它在一个字符串上完成,并且在整个列表中的一个命令中完成的方式有多么不同。

Either of these: 这些都是:

gsub("MD|M\\.D\\.|PHD", "", test)  # target specific strings
gsub("\\,.+$", "", test)        # target all characters after comma

Both Matt Parker above and Tommy below have raised the question whether 'MRCP', 'PhD', 'D.Phil.' 上面的Matt Parker和下面的Tommy都提出了“MRCP”,“PhD”,“D.Phil”的问题。 and 'Ph.D.' 和'博士' or other British or Continental designations of doctorate level degrees should be sought out and removed. 或者应该寻找和删除其他英国或大陆的博士学位。 Perhaps @user56 can advise what the intent was. 也许@ user56可以告知意图是什么。

With a single ugly regex: 有一个丑陋的正则表达式:

 gsub('[M,P].?D.?','',npt2$name)

Which says, find characters M or P followed by zero or one character of any kind, followed by a D and zero or one additional character. 其中说,找到字符M或P后跟零或任何一种字符,后跟D和零或一个附加字符。 More explicitly, you could do this in three steps: 更明确地说,您可以通过三个步骤完成此操作:

npt2$name <- gsub('MD','',npt2$name)
npt2$name <- gsub('M\\.D\\.','',npt2$name)
npt2$name <- gsub('PhD','',npt2name)

In those three, what's happening should be more straight forward. 在这三者中,正在发生的事情应该更加直截了当。 the second replacement you need to "escape" the period since its a special character. 第二次替换,你需要“逃避”这个特殊角色的时期。

Here's a variant that removes the extra ", " too. 这是一个删除额外“,”的变体。 Does not require touppper either - but if you want that, just specify ignore.case=TRUE to gsub . 不需要touppper - 但如果你想要,只需指定ignore.case=TRUEgsub

test <- c("Andreas Groll, M.D.", 
  "",
  "Pan-Chyr Yang, PHD",
  "Suh-Fang Jeng, Sc.D",
  "Peter S Sebel, MB BS, PhD Chantal Kerssens, PhD",
  "Lawrence Currie, MD")

gsub(",? *(MD|M\\.D\\.|P[hH]D)", "", test)
#[1] "Andreas Groll"                         ""                                     
#[3] "Pan-Chyr Yang"                         "Suh-Fang Jeng, Sc.D"                  
#[5] "Peter S Sebel, MB BS Chantal Kerssens" "Lawrence Currie"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 搜索并替换字符串列表中的多个字符串:改进R代码 - Search and replace multiple strings in list of strings: improve R code 如何在Python中的字符串中搜索列表项 - How to search items of a list within strings in Python Google Sheets 多次搜索并从具有重复字符串的列表中替换 - Google Sheets multiple search and replace from a list with repeated strings 如何替换文本中的字符串列表,其中某些字符串是其他字符串的子字符串? - How to replace a list of strings in a text where some of them are substrings of other? 如何在字符串列表的给定位置中搜索Char,然后将其分成子列表? - How to search for a Char in a given location in a list of Strings, and then separate them into sublists? 如何在没有替换方法的情况下与其他字符串交换字符串中的字符串 - How to swap strings within strings with other strings without replace methods 获取某些字符串之间的字符串数组或字符串列表(搜索多个字符串) - Get an Array or List of Strings between some Strings (Search multiple Strings) 如何在一行中搜索和替换多个字符串-Perl - How can i search and replace multiple strings in a line - Perl 用jQuery替换textarea中的多个字符串 - Replace multiple strings within textarea with jQuery 如何替换列表中的部分字符串 - How to replace parts of strings in a list
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM