R正则表达式删除了apostroph，除了之前和之后的字母

Question

I'm cleaning a text and I'd like to remove any apostrophe except for the ones preceded and followed by letters such as in : i'm, i'll, he's..etc. 我正在清理文本，我想删除任何撇号，除了之前和之后的字母，如：我是，我会，他......等等。

I the following preliminary solution, handling many cases, but I want a better one: 我有以下初步解决方案，处理很多案例，但我想要一个更好的案例：

rmAps <- function(x) gsub("^\'+| \'+|\'+ |[^[:alpha:]]\'+(a-z)*|\\b\'*$", " ", x)

rmAps("'i'm '' ' 'we end' '")
[1] " i'm   we end  "

I also tried: 我也尝试过：

(?<![a-z])'(?![a-z])

But I think I am still missing sth. 但我想我仍然想念......

Answer 1

gsub("'(?!\\w)|(?<!\\w)'", "", x, perl = TRUE)
#[1] "i'm   we end "

Remove occasions when your character is not followed by a word character: '(?!\\\\w) . 删除角色后面没有单词字符的情况： '(?!\\\\w) 。

Remove occasions when your character is not preceded by a word character: (?<!\\\\w)' . 删除角色前面没有单词字符的情况: (?<!\\\\w)' 。

If either of those situations occur, you want to remove it, so '(?!\\\\w)|(?<!\\\\w)' should do the trick. 如果出现上述任何一种情况，你想要删除它，所以'(?!\\\\w)|(?<!\\\\w)'应该可以解决问题。 Just note that \\\\w includes the underscore, and adjust as necessary. 请注意， \\\\w包含下划线，并根据需要进行调整。

Another option is 另一种选择是

gsub("\\w'\\w(*SKIP)(*FAIL)|'", "", x, perl = TRUE)

In this case, you match any instances when ' is surrounded by word characters: \\\\w'\\\\w , and then force that match to fail with (*SKIP)(*FAIL) . 在这种情况下，您匹配'被单词字符包围的任何实例： \\\\w'\\\\w ，然后强制该匹配失败(*SKIP)(*FAIL) 。 But, also look for ' using |' 但是，也要寻找'使用|' . 。 The result is that only occurrences of ' not wrapped in word characters will be matched and substituted out. 结果是只会出现'未包装在单词字符中”并将其替换掉的情况。

Answer 2

You can use the following regular expression: 您可以使用以下正则表达式：

(?<=\w)'(?=\w)

(?<=) is a positive lookbehind. (?<=)是一个积极的看法。 Everything inside needs to match before the next selector 内部的所有内容需要在下一个选择器之前匹配
(?=) is a positive lookahead. (?=)是一个积极的前瞻。 Everything inside needs to match after the previous selector 内部的所有内容都需要在前一个选择器之后匹配
\\w any alphanumeric character and the underscore \\w任何字母数字字符和下划线

You could also switch \\w to eg [a-zA-Z] if you want to restrict the results. 如果要限制结果，也可以将\\w切换到例如[a-zA-Z] 。

→ Here is your example on regex101 for live testing. →以下是regex101上用于实时测试的示例。

R正则表达式删除了apostroph，除了之前和之后的字母

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-01-29 07:56:03

解决方案2
1 2017-01-28 21:55:54

R正则表达式删除了apostroph，除了之前和之后的字母

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-01-29 07:56:03

解决方案2 1 2017-01-28 21:55:54

解决方案1
2 已采纳 2017-01-29 07:56:03

解决方案2
1 2017-01-28 21:55:54