简体   繁体   English

正则表达式:re.findall()用于所有字母单词的集合

[英]Regular expression: re.findall() for set of all alphabetic words

I'm trying to using re.findall() function to test on a sentence that has a set of all alphabetic words. 我正在尝试使用re.findall()函数对具有一组所有字母单词的句子进行测试。 Here's my code: 这是我的代码:

import re
s = 'Hello from the other side'
lst = re.findall('[:alpha:]', s)
print (lst)

Any suggestions on how I can change the code? 关于如何更改代码的任何建议?

Python doesn't support the POSIX :alpha: . Python不支持POSIX :alpha: :。 Write this instead: 改写这个:

re.findall(r'[A-Za-z]+', s)

Avoid use of \\w+ which accepts underscores and numbers in addition to alpha characters. 避免使用\\w+除字母字符外还接受下划线和数字。 The only real advantage of \\w+ is that it works with the re.LOCALE flag. \\w+的唯一真正优点是它可以与re.LOCALE标志一起使用。

When I parse natural sentences to extract entire words, I usually expand the allowed characters to also allow hyphens and apostrophes: 当我解析自然句子以提取整个单词时,通常会扩展允许的字符以允许连字符和撇号:

re.findall(r"[A-Za-z\-\']+", s)

This will accept words like "don't" and "re-invent" and "cul-de-sac" but will reject numbers, underscores, whitespace, quote marks, and other punctuation. 这将接受“不要”,“重新发明”和“ cul-de-sac”之类的词,但将拒绝数字,下划线,空格,引号和其他标点符号。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM