[英]R regex remove apostrophes NOT between letters
I'm able to remove all punctuation from a string while keeping apostrophes, but I'm now stuck on how to remove any apostrophes that are not between two letters. 我可以在保留撇号的同时从字符串中删除所有标点符号,但我现在仍然坚持如何删除不在两个字母之间的任何撇号。
str1 <- "I don't know 'how' to remove these ' things"
Should look like this: 应该是这样的:
"I don't know how to remove these things"
You may use a regex approach: 您可以使用正则表达式方法:
str1 <- "I don't know 'how' to remove these ' things"
gsub("\\s*'\\B|\\B'\\s*", "", str1)
See this IDEONE demo and a regex demo . 请参阅此IDEONE演示和正则表达式演示 。
The regex matches: 正则表达式匹配:
\\\\s*'\\\\B
- 0+ whitespaces, '
and a non-word boundary \\\\s*'\\\\B
- 0+空格, '
和非字边界 |
- or - 要么 \\\\B'\\\\s*
- a non-word boundary, '
and 0+ whitespaces \\\\B'\\\\s*
- 非字边界'
和0+空格 If you do not need to care about the extra whitespace that can remain after removing standalone '
, you can use a PCRE regex like 如果你不需要关心多余的空格,可以保持消除独立后'
,你可以使用正则表达式PCRE像
\b'\b(*SKIP)(*F)|'
See the regex demo 请参阅正则表达式演示
Explanation : 说明 :
\\b'\\b
- match a '
in-between word characters \\b'\\b
- 匹配'
中间的单词字符 (*SKIP)(*F)
- and omit the match (*SKIP)(*F)
- 并省略匹配 |
- or match... - 或匹配...... '
- an apostrophe in another context. '
- 另一种情况下的撇号。 See an IDEONE demo : 查看IDEONE演示 :
gsub("\\b'\\b(*SKIP)(*F)|'", "", str1, perl=TRUE)
To account for apostrophes in-between Unicode letters , add (*UTF)(*UCP)
flags at the start of the pattern and use a perl=TRUE
argument: 要考虑Unicode字母之间的撇号,在模式的开头添加(*UTF)(*UCP)
标志并使用perl=TRUE
参数:
gsub("(*UTF)(*UCP)\\s*'\\B|\\B'\\s*", "", str1, perl=TRUE)
^^^^^^^^^^^^ ^^^^^^^^^
Or 要么
gsub("(*UTF)(*UCP)\\b'\\b(*SKIP)(*F)|'", "", str1, perl=TRUE)
^^^^^^^^^^^^
See another IDEONE demo 请参阅另一个IDEONE演示
This method using gsub
work: 这个方法使用gsub
工作:
gsub("(([^A-Za-z])'|'([^A-Za-z]))", "\\2 ", str1)
"I don't know how to remove these things"
It would require a second round to remove extra spaces. 这将需要第二轮来移除额外的空间。 So 所以
gsub(" +", " ", gsub("(([^A-Za-z])'|'([^A-Za-z]))", "\\2 ", str1))
Here's one approach using lookarounds in base: 这是使用基础中的lookarounds的一种方法:
gsub("(?<![a-zA-Z])(')|(')(?![a-zA-Z])", "", str1, perl=TRUE)
## [1] "I don't know how to remove these things"
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.