[英]Regex to remove everything except letters and remove multiple spaces
I'm trying to make a single regex to remove everything except: 我正在尝试制作一个正则表达式来删除所有内容,除了:
I tried ([^\\\\p{L} ']+
with a Lookbehind for the extra spaces (?<=\\\\s)\\\\s+
. Each works in isolation: 我尝试了([^\\\\p{L} ']+
并为其添加多余空格(?<=\\\\s)\\\\s+
。
gsub("(?<=\\s)\\s+", "", "I like 56 dogs that's him55.", perl = TRUE)
## [1] "I like 56 dogs that's him55."
gsub("[^\\p{L} ']+", "", "I like 56 dogs that's him55.", perl = TRUE)
## [1] "I like dogs that's him"
But when I use or ( |
) to connect them: 但是当我使用或( |
)连接它们时:
gsub("((?<=\\s)\\s+)|([^\\p{L} ']+)", "", "I like 56 dogs that's him55.", perl = TRUE)
This returns: 返回:
[1] "I like dogs that's him"
I'd like it to remove the multiple extra space (between like & dogs) like: 我希望它删除多个多余的空间(像&狗之间),例如:
[1] "I like dogs that's him"
How can I use one regex to remove everything except letters, apostrophes and extra spaces? 如何使用一个正则表达式删除除字母,撇号和多余空格以外的所有内容?
看来问题出在您的正则表达式中,这会使每个数字都变成空格,下面的代码对我来说很好用:
gsub("[^\\p{L}']+", " ", "I like 56 dogs that's him55.", perl = TRUE)
You can try the following if you're trying to do this in one call: 如果您要在一个呼叫中尝试执行以下操作,则可以尝试以下操作:
gsub("[^\\pL' ]+\\h+(?=\\h)|\\h+(?=[^\\pL' ]+)|[^\\pL' ]+", "", x, perl=T)
# [1] "I like dogs that's him"
Here is another way you could approach this if you desire which is more efficient IMO. 如果您希望使用更有效的IMO,则可以采用另一种方法来解决此问题。
x <- "I like 56 dogs that's him55."
r <- gsub("[^\\pL' ]+", '', x, perl=T)
paste(strsplit(r, '\\s+')[[1]], collapse = ' ')
# [1] "I like dogs that's him"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.