您如何在python中编写正则表达式，以查找仅包含字母，数字和下划线的所有单词？

Question

This is the best I was able to come up with: 这是我能想到的最好的方法：

b = re.findall(r'\b[a-zA-Z0-9_]\b', 'ahz2gb_ $f heyght78_')

But that doesn't work. 但这是行不通的。 Also, not that I'm only interested in regexes at the moment. 另外，不是我目前只对正则表达式感兴趣。 I can solve the problem the long way. 我可以解决很长的路要走的问题。

The expected result is a list containing [ahz2gb_, heyght78_] 预期结果是一个包含[ahz2gb_，heyght78_]的列表

Answer 1

There is \\w for capturing those characters, and you need to allow more than one character with + : 有\\w来捕获这些字符，并且您需要使用+允许多个字符：

b = re.findall(r'\b\w+\b', 'ahz2gb_ $f heyght78_')

As + is greedy, you don't really need the \\b either: 由于+是贪婪的，因此您实际上也不需要\\b ：

b = re.findall(r'\w+', 'ahz2gb_ $f heyght78_')

If you need words to be split by white space only (not \\b ), then you can use look-around: 如果您只需要用空格（而不是\\b ）来分隔单词，则可以使用环顾四周：

b = re.findall(r'(?<!\S)\w+(?!\S)', 'ahz2gb_ $f heyght78_')

The (?<! sequence means: look back to see you don't have the pattern that follows (?<! preceding the current matching position in the target string. So in this case (?<!\\S) means: there should not be a preceding non-white-space character. (?<!序列的意思是：回头看看您没有后面的模式(?<!在目标字符串中当前匹配位置的前面。因此，在这种情况下(?<!\\S)意思是：不能是前面的非空白字符。

Then (?! is similar, but looking forward (without matching). 然后(?!是相似的，但是期待（不匹配）。

Answer 2

Simple to understand will be regex .. 正则表达式很容易理解。

^[0-9a-zA-Z_]+$ : strictly numbers, alphabets and underscore ^[0-9a-zA-Z_]+$ ：严格由数字，字母和下划线组成
^[0-9a-zA-Z_ ]+$ : strictly numbers, alphabets, underscore and spaces ^[0-9a-zA-Z_ ]+$ ：严格由数字，字母，下划线和空格组成

If you need words from the matched lines, then spilt using space as delimiter. 如果您需要匹配行中的单词，则可以使用空格作为分隔符来溢出。

You can try python regex online on http://pythex.org/ 您可以在http://pythex.org/上在线尝试python regex

Sample Run on IDLE 在IDLE上运行示例

>>> import re
>>> re.findall(r'^[a-zA-Z0-9_ ]+$', 'ahz2gb_ f heyght78_')[0].split(' ')
['ahz2gb_', 'f', 'heyght78_']

EDIT : Given new requirement of only having words, here is how you can achieve the same. EDIT ：给出了只具有单词的新要求，这就是您可以实现的条件。

import re
mylist =  'ahz2gb_ $f heyght78_'.split(' ')
r = re.compile("^[0-9a-zA-Z_]+$")
newlist = list(filter(r.match, mylist))
print(newlist)

Wish, I could shorten it!! 希望，我可以缩短它！

Sample Run 样品运行

========= RESTART: C:/regex.py =========
['ahz2gb_', 'heyght78_']

您如何在python中编写正则表达式，以查找仅包含字母，数字和下划线的所有单词？

问题描述

2 个解决方案

解决方案1
4 2017-08-01 08:36:17

解决方案2
2 已采纳 2017-08-01 08:39:08

您如何在python中编写正则表达式，以查找仅包含字母，数字和下划线的所有单词？

问题描述

2 个解决方案

解决方案1 4 2017-08-01 08:36:17

解决方案2 2 已采纳 2017-08-01 08:39:08

解决方案1
4 2017-08-01 08:36:17

解决方案2
2 已采纳 2017-08-01 08:39:08