[英]Replace a whole word containing a pattern - gsub and R
I am trying to clean some garbage out of some text.我正在尝试从一些文本中清除一些垃圾。 While doing this, I am assuming that any word that has a letter (any letter) repeated three or more times is garbage - and I want to remove it.
在这样做时,我假设任何有一个字母(任何字母)重复三次或更多次的单词都是垃圾 - 我想删除它。
I've come up with this:我想出了这个:
gsub(pattern = "[a-zA-Z]\\1\\1", replacement = "", string)
in which string
is the character vector, but this doesn't work.其中
string
是字符向量,但这不起作用。 Everything else I've tried might find the pattern, but it just removes the pattern, leaving a mess.我尝试过的所有其他事情都可能找到模式,但它只是删除了模式,留下了一团糟。 I'm trying to remove the whole word with the pattern in it.
我正在尝试删除带有模式的整个单词。
Any ideas?有任何想法吗?
You need to assign a "capture group" to the [.]
class by wrapping it in parens, since the \\1
needs something to reference:您需要将“捕获组”分配给
[.]
class,方法是将其包装在括号中,因为\\1
需要参考:
gsub("([a-zA-Z])\\1\\1", "", "aabbbccdddee")
# [1] "aaccee"
You need你需要
gsub("\\s*[[:alpha:]]*([[:alpha:]])\\1{2}[[:alpha:]]*", "", string)
gsub("\\s*\\p{L}*(\\p{L})\\1{2}\\p{L}*", "", string, perl=TRUE)
stringr::str_replace_all(string, "\\s*\\p{L}*(\\p{L})\\1{2}\\p{L}*", "")
string <- "This is a baaaad unnnnecessary short word"
gsub("\\s*[[:alpha:]]*([[:alpha:]])\\1{2}[[:alpha:]]*", "", string)
gsub("\\s*\\p{L}*(\\p{L})\\1{2}\\p{L}*", "", string, perl=TRUE)
library(stringr)
str_replace_all(string, "\\s*\\p{L}*(\\p{L})\\1{2}\\p{L}*", "")
All yielding [1] "This is a short word"
.全部产生
[1] "This is a short word"
。
See the regex demo .请参阅正则表达式演示。 Regex details :
正则表达式详细信息:
\s*
- zero or more whitespaces \s*
- 零个或多个空格\p{L}*
/ [[:alpha:]]*
- zero or more letters \p{L}*
/ [[:alpha:]]*
- 零个或多个字母(\p{L})
- Capturing group 1: any single letter (\p{L})
- 捕获组 1:任何单个字母\1{2}
- two occurrences of the same value as in Group 1 \1{2}
- 两次出现与第 1 组中相同的值\p{L}*
/ [[:alpha:]]*
- zero or more letters. \p{L}*
/ [[:alpha:]]*
- 零个或多个字母。 r2evans example with different regex:具有不同正则表达式的 r2evans 示例:
gsub("(\\w)\\1{2, }", "", "aabbbccdddee")
[1] "aaccee"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.