R regex remove apostroph except the ones preceded and followed by letter

Question

I'm cleaning a text and I'd like to remove any apostrophe except for the ones preceded and followed by letters such as in : i'm, i'll, he's..etc.

I the following preliminary solution, handling many cases, but I want a better one:

rmAps <- function(x) gsub("^\'+| \'+|\'+ |[^[:alpha:]]\'+(a-z)*|\\b\'*$", " ", x)

rmAps("'i'm '' ' 'we end' '")
[1] " i'm   we end  "

I also tried:

(?<![a-z])'(?![a-z])

But I think I am still missing sth.

Answer 1

gsub("'(?!\\w)|(?<!\\w)'", "", x, perl = TRUE)
#[1] "i'm   we end "

Remove occasions when your character is not followed by a word character: '(?!\\\\w) .

Remove occasions when your character is not preceded by a word character: (?<!\\\\w)' .

If either of those situations occur, you want to remove it, so '(?!\\\\w)|(?<!\\\\w)' should do the trick. Just note that \\\\w includes the underscore, and adjust as necessary.

Another option is

gsub("\\w'\\w(*SKIP)(*FAIL)|'", "", x, perl = TRUE)

In this case, you match any instances when ' is surrounded by word characters: \\\\w'\\\\w , and then force that match to fail with (*SKIP)(*FAIL) . But, also look for ' using |' . The result is that only occurrences of ' not wrapped in word characters will be matched and substituted out.

Answer 2

You can use the following regular expression:

(?<=\w)'(?=\w)

(?<=) is a positive lookbehind. Everything inside needs to match before the next selector
(?=) is a positive lookahead. Everything inside needs to match after the previous selector
\\w any alphanumeric character and the underscore

You could also switch \\w to eg [a-zA-Z] if you want to restrict the results.

→ Here is your example on regex101 for live testing.

R regex remove apostroph except the ones preceded and followed by letter

Question

2 answers

solution1
2 ACCPTED 2017-01-29 07:56:03

solution2
1 2017-01-28 21:55:54

R regex remove apostroph except the ones preceded and followed by letter

Question

2 answers

solution1 2 ACCPTED 2017-01-29 07:56:03

solution2 1 2017-01-28 21:55:54

solution1
2 ACCPTED 2017-01-29 07:56:03

solution2
1 2017-01-28 21:55:54