简体   繁体   English

Python正则表达式findall

[英]Python regular expression findall

I want to find all 2 word strings in python.我想在 python 中找到所有 2 个单词字符串。 I created this:我创建了这个:

#!/usr/bin/python
import re

string='a1 a2 a3 a5 a6'
search=re.findall('.. ..',string)
print len(search)
for nk in search:
        print nk

I am getting: a1 a2 a3 a5 While I wanted:a1 a2,a2 a3,a3 a5,... etc The findall should search for all possible patterns?我得到:a1 a2 a3 a5 虽然我想要:a1 a2,a2 a3,a3 a5,... etc findall 应该搜索所有可能的模式? And why returns a1 a2,a3 a5?为什么返回 a1 a2,a3 a5? Thank you.谢谢你。

It returns ['a1 a2', 'a3 a5'] , because these are the only patterns which can be found: after applying the first one, the 'a1 a2' part is gone and ' a3 a5 a6' is left.它返回['a1 a2', 'a3 a5'] ,因为这些是唯一可以找到的模式:应用第一个模式后, 'a1 a2'部分消失了,剩下' a3 a5 a6' The next possible pattern is 'a3 a5' , and ' a6' is left over and cannot be matched further.下一个可能的模式是'a3 a5' ,而' a6'是剩下的,无法进一步匹配。

'a1 a3' , 'a1 a5' etc. cannot be found because this combinations don't occur.无法找到'a1 a3''a1 a5'等,因为这种组合不会发生。 Remember, you search for two arbitrary characters, followed by a space character, followed by 2 arbitrary characters.请记住,您搜索两个任意字符,然后是一个空格字符,然后是 2 个任意字符。

With

r=re.compile(r"(\S{2})(?:\s|$)")
pairs =r.findall("a1 a2 a3 a5 a6")

or或者

pairs = re.findall(r"(\S{2})(?:\s|$)", "a1 a2 a3 a5 a6")

you find all 2-character combination which are wither followed by a space or by the end of the string: ['a1', 'a2', 'a3', 'a5', 'a6'] .您会找到所有 2 个字符的组合,这些组合后跟一个空格或字符串末尾: ['a1', 'a2', 'a3', 'a5', 'a6'] If you combine these, you will find all possible combinations:如果将这些组合起来,您会发现所有可能的组合:

for ifirst in range(len(pairs) - 1):
    for second in pairs[ifirst + 1:]:
        print " ".join((pairs[ifirst], second))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM