Python regular expression findall using Lookahead

Question

I'm new to regular expressions and would like to understand how findall() and lookahead can be used to find all occurrences of a given pattern within a string. I am having problems with alternating characters. Here is an example of what I want:

s = 'ababa4abaab'
p = 'aba'
print([ s[i:i+len(p)] for i in range(len(s)) if s[i:i+len(p)]==p])
['aba', 'aba', 'aba']

Here is my attempt with findall():

import re
re.findall('aba', 'ababa4abaab')
['aba', 'aba']

It only returns 2 matches but I want all three. I read this tutorial but did not quite understand. I tried

re.findall('(?=aba)', 'ababa4abaab')
['', '', '']

Can someone please tell me how to use this lookahead concept in this case and provide a brief explanation of how it works?

Answer 1

I think you just need to search either there is an 'ab' and 'a' right after, You don't need to catch it as 'aba', you can use this look ahead:

ab(?=a)

which gives you 3 matches.

you can also capture it inside a group and then iterate each one of them and concatenate 'a' so you'll end with the desired text 'aba' for each match

 (ab(?=a))

Answer 2

Official doc about findall says it

"Return a list of all non-overlapping matches in the string."

Python regular expression findall using Lookahead

Question

2 answers

solution1
0 2017-12-27 08:40:28

solution2
0 2017-12-27 09:29:45

Python regular expression findall using Lookahead

Question

2 answers

solution1 0 2017-12-27 08:40:28

solution2 0 2017-12-27 09:29:45

solution1
0 2017-12-27 08:40:28

solution2
0 2017-12-27 09:29:45