Equivalent to (.*) in negative look behind assertion Regex Python

Question

I am writing a negative lookbehind assertion expression in Python which performs the following function to parse a plain text file:

Does not match anything followed after http:// * ** * ** * ** * ; but will match the pattern when it is not inside a http:// * link

Example:
http://www.test.com/aa4   cd6
bx2 vq9 
yu9 http://www.bh9.com/cj3

Matches: cd6,bx2,vq9 and yu9

So I tried regexps like

r'(?<!http://(.*))([a-z][a-z][0-9])'
r'(?<!http://*)([a-z][a-z][0-9])'

They did not work.

How to add.* or do similar opearion inside negative look behind assertion regex in Python.

Answer 1

Problem: Lookbehind does not allow pattern whose length is not fixed.

Quick hack: Perhaps the following regexp does the job?

r'(?<![./])[a-z][a-z][0-9]'

It works like this:

>>> str = """http://www.test.com/aa4
... bx2 vq9 
... http://www.bh9.com/cj3
... """
>>> re.findall(r'(?<![./])[a-z][a-z][0-9]',str)
['bx2', 'vq9']

Or - as another solution - use a regexp matching urls to cut off all urls in your string and then search for r'[az][az][0-9]'

Answer 2

That not possible. Python allows only fixed length lookbehinds. That means no quantifier inside the lookbehind.

See here the feature list on egular-expressions.info