[英]R regex remove apostroph except the ones preceded and followed by letter
I'm cleaning a text and I'd like to remove any apostrophe except for the ones preceded and followed by letters such as in : i'm, i'll, he's..etc. 我正在清理文本,我想删除任何撇号,除了之前和之后的字母,如:我是,我会,他......等等。
I the following preliminary solution, handling many cases, but I want a better one: 我有以下初步解决方案,处理很多案例,但我想要一个更好的案例:
rmAps <- function(x) gsub("^\'+| \'+|\'+ |[^[:alpha:]]\'+(a-z)*|\\b\'*$", " ", x)
rmAps("'i'm '' ' 'we end' '")
[1] " i'm we end "
I also tried: 我也尝试过:
(?<![a-z])'(?![a-z])
But I think I am still missing sth. 但我想我仍然想念......
gsub("'(?!\\w)|(?<!\\w)'", "", x, perl = TRUE)
#[1] "i'm we end "
Remove occasions when your character is not followed by a word character: '(?!\\\\w)
. 删除角色后面没有单词字符的情况:
'(?!\\\\w)
。
Remove occasions when your character is not preceded by a word character: (?<!\\\\w)'
. 删除角色前面没有单词字符的情况:
(?<!\\\\w)'
。
If either of those situations occur, you want to remove it, so '(?!\\\\w)|(?<!\\\\w)'
should do the trick. 如果出现上述任何一种情况,你想要删除它,所以
'(?!\\\\w)|(?<!\\\\w)'
应该可以解决问题。 Just note that \\\\w
includes the underscore, and adjust as necessary. 请注意,
\\\\w
包含下划线,并根据需要进行调整。
Another option is 另一种选择是
gsub("\\w'\\w(*SKIP)(*FAIL)|'", "", x, perl = TRUE)
In this case, you match any instances when '
is surrounded by word characters: \\\\w'\\\\w
, and then force that match to fail with (*SKIP)(*FAIL)
. 在这种情况下,您匹配
'
被单词字符包围的任何实例: \\\\w'\\\\w
,然后强制该匹配失败(*SKIP)(*FAIL)
。 But, also look for '
using |'
但是,也要寻找
'
使用|'
. 。 The result is that only occurrences of
'
not wrapped in word characters will be matched and substituted out. 结果是只会出现
'
未包装在单词字符中”并将其替换掉的情况。
You can use the following regular expression: 您可以使用以下正则表达式:
(?<=\w)'(?=\w)
(?<=)
is a positive lookbehind. (?<=)
是一个积极的看法。 Everything inside needs to match before the next selector (?=)
is a positive lookahead. (?=)
是一个积极的前瞻。 Everything inside needs to match after the previous selector \\w
any alphanumeric character and the underscore \\w
任何字母数字字符和下划线 You could also switch \\w
to eg [a-zA-Z]
if you want to restrict the results. 如果要限制结果,也可以将
\\w
切换到例如[a-zA-Z]
。
→ Here is your example on regex101 for live testing. →以下是regex101上用于实时测试的示例。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.