Split string using regular expression, how to ignore apostrophe?

Question

I am doing a spell check tutorial in Python and it uses this regular expression:

import re
def split_line(line):
    return re.findall('[A-Za-z]+(?:\`[A-Za-z)+)?',line)

I was wondering if you could help me change this function so it will ignore ' , ie if I input the string he's i will get ['he's'] and not ['he','s'] .

Answer 1

First you'll need to fix the original expression by replacing ) with ] as mentioned by Marcin. Then simply add ' to the list of allowed characters (escaped by a back-slash):

import re
def split_line(line):
    return re.findall('[A-Za-z\']+(?:\`[A-Za-z]+)?',line)

split_line("He's my hero")

#["He's", 'my', 'hero']

Of course, this will not consider any edge cases where the apostrophe is at the beginning or at the end of a word.

Answer 2

Your regex is supposed to match one or more letters and then an optional occurrence of a backtick and again one or more letters. You can put the backtick into a character class and add ' into the class.

Note that you do not need to escape ' if you use a double-quoted string literal:

re.findall(r"[A-Za-z]+(?:['`][A-Za-z]+)*", line)

See the regex demo . Details :

[A-Za-z]+ - one or more ASCII letters (use [^\\W\\d_]+ to match any one or more Unicode letters)
(?:['`][A-Za-z]+)* - zero or more occurrences of ' or backtick followed with one or more ASCII letters.

See the Python demo :

import re
text = "And he's done it o`key!"
print(re.findall(r"[A-Za-z]+(?:['`][A-Za-z]+)*", text))
# => ['And', "he's", 'done', 'it', 'o`key']

Split string using regular expression, how to ignore apostrophe?

Question

2 answers

solution1
1 ACCPTED 2015-02-27 07:46:02

solution2
0 2022-01-22 20:33:02

Split string using regular expression, how to ignore apostrophe?

Question

2 answers

solution1 1 ACCPTED 2015-02-27 07:46:02

solution2 0 2022-01-22 20:33:02

solution1
1 ACCPTED 2015-02-27 07:46:02

solution2
0 2022-01-22 20:33:02