从字符串中删除/替换特定的单词或短语-R

Question

I looked around both here and elsewhere, I found many similar questions but none which exactly answer mine. 我在这里和其他地方四处张望，我发现了许多类似的问题，但没有一个问题能完全回答我的问题。 I need to clean up naming conventions, specifically replace/remove certain words and phrases from a specific column/variable, not the entire dataset. 我需要清理命名约定，特别是替换/删除特定列/变量中的某些单词和短语，而不是整个数据集。 I am migrating from SPSS to R, I have an example of the code to do this in SPSS below, but I am not sure how to do it in R. 我正在从SPSS迁移到R，下面有一个在SPSS中执行此操作的代码示例，但是我不确定如何在R中执行此操作。

EG: 例如：

"Acadia Parish" --> "Acadia" (removes Parish and space before Parish) “ Acadia教区”->“ Acadia”（删除教区和教区之前的空间）

"Fifth District" --> "Fifth" (removes District and space before District) “第五区”->“第五区”（删除区和在区之前的空间）

SPSS syntax: SPSS语法：

COMPUTE county=REPLACE(county,' Parish','').

There are only a few instances of this issue in the column with 32,000 cases, and what needs replacing/removing varies and the cases can repeat (there are dozens of instances of a phrase containing 'Parish'), meaning it's much faster to code what needs to be removed/replaced, it's not as simple or clean as a regular expression to remove all spaces, all characters after a specific word or character, all special characters, etc. And it must include leading spaces. 在该列中只有少数情况下有32,000个案例，并且需要替换/删除的内容各不相同，并且案例可以重复（包含“ Parish”的短语有数十个实例），这意味着编写代码的速度要快得多需要删除/替换，要删除所有空格，特定单词或字符之后的所有字符，所有特殊字符等，不如正则表达式那么简单或干净。它必须包含前导空格。

I have looked at the replace() gsub() and other similar commands in R, but they all involve creating vectors, or it seems like they do. 我已经看过R中的replace（）gsub（）和其他类似的命令，但是它们都涉及创建向量，或者看起来确实如此。 What I'd like is syntax that looks for characters I specify, which can include leading or trailing spaces, and replaces them with something I specify, which can include nothing at all, and if it does not find the specific characters, the case is unchanged. 我想要的是查找指定字符的语法，该字符可以包含前导或尾随空格，然后用我指定的字符替换它们，该字符可以不包含任何内容，如果找不到特定字符，则为不变。

Yes, I will end up repeating the same syntax many times, it's probably easier to create a vector but if possible I'd like to get the syntax I described, as there are other similar operations I need to do as well. 是的，我最终会重复多次相同的语法，创建向量可能更容易，但是如果可能的话，我想获取我描述的语法，因为我还需要执行其他类似的操作。

Thank you for looking. 谢谢您的光临。

Answer 1

Maybe I'm missing something but I don't see why you can't simply use conditionals in your regex expression, then trim out the annoying white space. 也许我缺少了一些东西，但是我不明白为什么不能在正则表达式中简单地使用条件，然后删掉烦人的空白。

string <- c("Arcadia Parish", "Fifth District")

bad_words <- c("Parish", "District") # Write all the words you want removed here!
bad_regex <- paste(bad_words, collapse = "|")

trimws( sub(bad_regex, "", string) )

# [1] "Arcadia" "Fifth"

Answer 2

dataframename$varname <- gsub(" Parish","", dataframename$varname)

Answer 3

> x <- c("Acadia Parish", "Fifth District")
> x2 <- gsub("^(\\w*).*$", "\\1", x)
> x2
[1] "Acadia" "Fifth"

Legend: 传说：

^ Start of pattern. ^模式开始。
() Group (or token). （）组（或令牌）。
\\w* One or more occurrences of word character more than 1 times. \\ w *一次或多次出现单词字符超过1次。
.* one or more occurrences of any character except new line \\n. 。*除换行\\ n之外，任何字符都会出现一次或多次。
$ end of pattern. $模式结束。
\\1 Returns group from regexp \\ 1从正则表达式返回组

从字符串中删除/替换特定的单词或短语-R

问题描述

3 个解决方案

解决方案1
0 2017-01-26 21:55:33

解决方案2
0 已采纳 2017-01-26 22:12:34

解决方案3
0 2017-01-26 22:21:54

Legend: 传说：

从字符串中删除/替换特定的单词或短语-R

问题描述

3 个解决方案

解决方案1 0 2017-01-26 21:55:33

解决方案2 0 已采纳 2017-01-26 22:12:34

解决方案3 0 2017-01-26 22:21:54

Legend: 传说：

解决方案1
0 2017-01-26 21:55:33

解决方案2
0 已采纳 2017-01-26 22:12:34

解决方案3
0 2017-01-26 22:21:54