[英]How do you write a regex in python that finds all word which contain only letters, numbers and underscore?
This is the best I was able to come up with: 这是我能想到的最好的方法:
b = re.findall(r'\b[a-zA-Z0-9_]\b', 'ahz2gb_ $f heyght78_')
But that doesn't work. 但这是行不通的。 Also, not that I'm only interested in regexes at the moment.
另外,不是我目前只对正则表达式感兴趣。 I can solve the problem the long way.
我可以解决很长的路要走的问题。
The expected result is a list containing [ahz2gb_, heyght78_] 预期结果是一个包含[ahz2gb_,heyght78_]的列表
There is \\w
for capturing those characters, and you need to allow more than one character with +
: 有
\\w
来捕获这些字符,并且您需要使用+
允许多个字符:
b = re.findall(r'\b\w+\b', 'ahz2gb_ $f heyght78_')
As +
is greedy, you don't really need the \\b
either: 由于
+
是贪婪的,因此您实际上也不需要\\b
:
b = re.findall(r'\w+', 'ahz2gb_ $f heyght78_')
If you need words to be split by white space only (not \\b
), then you can use look-around: 如果您只需要用空格(而不是
\\b
)来分隔单词,则可以使用环顾四周:
b = re.findall(r'(?<!\S)\w+(?!\S)', 'ahz2gb_ $f heyght78_')
The (?<!
sequence means: look back to see you don't have the pattern that follows (?<!
preceding the current matching position in the target string. So in this case (?<!\\S)
means: there should not be a preceding non-white-space character. (?<!
序列的意思是:回头看看您没有后面的模式(?<!
在目标字符串中当前匹配位置的前面。因此,在这种情况下(?<!\\S)
意思是:不能是前面的非空白字符。
Then (?!
is similar, but looking forward (without matching). 然后
(?!
是相似的,但是期待(不匹配)。
Simple to understand will be regex .. 正则表达式很容易理解。
^[0-9a-zA-Z_]+$
: strictly numbers, alphabets and underscore ^[0-9a-zA-Z_]+$
:严格由数字,字母和下划线组成 ^[0-9a-zA-Z_ ]+$
: strictly numbers, alphabets, underscore and spaces ^[0-9a-zA-Z_ ]+$
:严格由数字,字母,下划线和空格组成 If you need words from the matched lines, then spilt using space as delimiter. 如果您需要匹配行中的单词,则可以使用空格作为分隔符来溢出。
You can try python regex online on http://pythex.org/ 您可以在http://pythex.org/上在线尝试python regex
Sample Run on IDLE 在IDLE上运行示例
>>> import re
>>> re.findall(r'^[a-zA-Z0-9_ ]+$', 'ahz2gb_ f heyght78_')[0].split(' ')
['ahz2gb_', 'f', 'heyght78_']
EDIT
: Given new requirement of only having words, here is how you can achieve the same. EDIT
:给出了只具有单词的新要求,这就是您可以实现的条件。
import re
mylist = 'ahz2gb_ $f heyght78_'.split(' ')
r = re.compile("^[0-9a-zA-Z_]+$")
newlist = list(filter(r.match, mylist))
print(newlist)
Wish, I could shorten it!! 希望,我可以缩短它!
Sample Run 样品运行
========= RESTART: C:/regex.py =========
['ahz2gb_', 'heyght78_']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.