Is regex always greedy even when I give it look ahead and look behind requirements?

Question

I have an re.sub program which substitutes certain values in between commas in text_string :

re.sub('(?:(?<=\,)|(?<=^))[^\w\d\r\n\t]*(HUN)[^\w\d\r\n\t]*(?=(?:\,|$))','',text_string,flags=re.IGNORECASE)

which replaces HUN with nothing.

I try this on many files. Sometimes the files are huge, sometimes they are small. Occasionally, I will get a MemoryError from the re.py library. What is the best way to split up this execution so that I will not get a MemoryError ?

I'm afraid that the regex is looking at the ENTIRE string first (eg in if text_string is t,w,g,g,hun,t,w ), before looking between the commas, instead of just looking between the commas (ie in a non-greedy way). Does anyone know how this actually gets evaluated?

If the string is super long, does the regex know to evaluate just between the commas in a non-greedy way? Thanks.

Answer 1

Your pattern is really weird.

(?:(?<=\\,)|(?<=^)) - This can be just turned into a regular non-capturing group (?:,|^)
[^\\w\\d] - since \\w already matches \\d , \\d is redundant
[^\\w\\r\\n\\t]* - matches punctuation(!) and thus , , too. It makes it hard for the regex engine to analyze strings that have many comma-separated values before your hun .
(?=(?:,|$)) - the lookahead make sense if you plan to match overlapping strings, otherwise, you can replace it with (?:,|$) .

I suggest:

r"(?i)(?:,|^)[^\w\r\n\t]*(HUN)[^\w\r\n\t]*(?=(?:,|$))"

See regex demo

Python demo :

import re
s = ",WWWWWW,hun,hun,WWWWW,"
print re.sub(r"(?i)((?:,|^)[^\w\r\n\t]*)HUN([^\w\r\n\t]*)(?=(?:,|$))", r"\1\2", s)
# => ,WWWWWW,,,WWWWW,

Answer 2

You can do it in a faster way without regex like this:

s = 't,w,g,g,hun,t,w'
res = ','.join(['' if x.lower()=='hun' else x for x in s.split(',')])

Is regex always greedy even when I give it look ahead and look behind requirements?

Question

2 answers

solution1
1 2015-12-11 21:51:28

solution2
0 2015-12-11 22:41:51

Is regex always greedy even when I give it look ahead and look behind requirements?

Question

2 answers

solution1 1 2015-12-11 21:51:28

solution2 0 2015-12-11 22:41:51

solution1
1 2015-12-11 21:51:28

solution2
0 2015-12-11 22:41:51