[英]Split string using regular expression, how to ignore apostrophe?
I am doing a spell check tutorial in Python and it uses this regular expression:我正在用 Python 做一个拼写检查教程,它使用这个正则表达式:
import re
def split_line(line):
return re.findall('[A-Za-z]+(?:\`[A-Za-z)+)?',line)
I was wondering if you could help me change this function so it will ignore '
, ie if I input the string he's
i will get ['he's']
and not ['he','s']
.我想知道您是否可以帮助我更改此功能,使其忽略'
,即如果我输入字符串he's
我将得到['he's']
而不是['he','s']
。
First you'll need to fix the original expression by replacing )
with ]
as mentioned by Marcin. 首先,您需要按Marcin所述将)
替换为]
来修复原始表达式。 Then simply add '
to the list of allowed characters (escaped by a back-slash): 然后只需将'
添加到允许的字符列表中(以反斜杠转义):
import re
def split_line(line):
return re.findall('[A-Za-z\']+(?:\`[A-Za-z]+)?',line)
split_line("He's my hero")
#["He's", 'my', 'hero']
Of course, this will not consider any edge cases where the apostrophe is at the beginning or at the end of a word. 当然,这将不考虑撇号在单词的开头或结尾处的任何边缘情况。
Your regex is supposed to match one or more letters and then an optional occurrence of a backtick and again one or more letters.您的正则表达式应该匹配一个或多个字母,然后可选地出现一个反引号,然后再匹配一个或多个字母。 You can put the backtick into a character class and add '
into the class.您可以将反引号放入字符类并将'
添加到类中。
Note that you do not need to escape '
if you use a double-quoted string literal:请注意,如果您使用双引号字符串文字,则不需要转义'
:
re.findall(r"[A-Za-z]+(?:['`][A-Za-z]+)*", line)
See the regex demo .请参阅正则表达式演示。 Details :详情:
[A-Za-z]+
- one or more ASCII letters (use [^\\W\\d_]+
to match any one or more Unicode letters) [A-Za-z]+
- 一个或多个 ASCII 字母(使用[^\\W\\d_]+
匹配任何一个或多个 Unicode 字母)(?:['`][A-Za-z]+)*
- zero or more occurrences of '
or backtick followed with one or more ASCII letters. (?:['`][A-Za-z]+)*
- 零次或多次出现'
或反引号,后跟一个或多个 ASCII 字母。See the Python demo :请参阅Python 演示:
import re
text = "And he's done it o`key!"
print(re.findall(r"[A-Za-z]+(?:['`][A-Za-z]+)*", text))
# => ['And', "he's", 'done', 'it', 'o`key']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.