[英]Regular expression: re.findall() for set of all alphabetic words
I'm trying to using re.findall() function to test on a sentence that has a set of all alphabetic words. 我正在尝试使用re.findall()函数对具有一组所有字母单词的句子进行测试。 Here's my code:
这是我的代码:
import re
s = 'Hello from the other side'
lst = re.findall('[:alpha:]', s)
print (lst)
Any suggestions on how I can change the code? 关于如何更改代码的任何建议?
Python doesn't support the POSIX :alpha:
. Python不支持POSIX
:alpha:
:。 Write this instead: 改写这个:
re.findall(r'[A-Za-z]+', s)
Avoid use of \\w+
which accepts underscores and numbers in addition to alpha characters. 避免使用
\\w+
除字母字符外还接受下划线和数字。 The only real advantage of \\w+
is that it works with the re.LOCALE
flag. \\w+
的唯一真正优点是它可以与re.LOCALE
标志一起使用。
When I parse natural sentences to extract entire words, I usually expand the allowed characters to also allow hyphens and apostrophes: 当我解析自然句子以提取整个单词时,通常会扩展允许的字符以允许连字符和撇号:
re.findall(r"[A-Za-z\-\']+", s)
This will accept words like "don't" and "re-invent" and "cul-de-sac" but will reject numbers, underscores, whitespace, quote marks, and other punctuation. 这将接受“不要”,“重新发明”和“ cul-de-sac”之类的词,但将拒绝数字,下划线,空格,引号和其他标点符号。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.