I want to find all 2 word strings in python. I created this:
#!/usr/bin/python
import re
string='a1 a2 a3 a5 a6'
search=re.findall('.. ..',string)
print len(search)
for nk in search:
print nk
I am getting: a1 a2 a3 a5 While I wanted:a1 a2,a2 a3,a3 a5,... etc The findall should search for all possible patterns? And why returns a1 a2,a3 a5? Thank you.
It returns ['a1 a2', 'a3 a5']
, because these are the only patterns which can be found: after applying the first one, the 'a1 a2'
part is gone and ' a3 a5 a6'
is left. The next possible pattern is 'a3 a5'
, and ' a6'
is left over and cannot be matched further.
'a1 a3'
, 'a1 a5'
etc. cannot be found because this combinations don't occur. Remember, you search for two arbitrary characters, followed by a space character, followed by 2 arbitrary characters.
With
r=re.compile(r"(\S{2})(?:\s|$)")
pairs =r.findall("a1 a2 a3 a5 a6")
or
pairs = re.findall(r"(\S{2})(?:\s|$)", "a1 a2 a3 a5 a6")
you find all 2-character combination which are wither followed by a space or by the end of the string: ['a1', 'a2', 'a3', 'a5', 'a6']
. If you combine these, you will find all possible combinations:
for ifirst in range(len(pairs) - 1):
for second in pairs[ifirst + 1:]:
print " ".join((pairs[ifirst], second))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.