Python regex : Difference in usage of re.sub, re.match & re.search for whitelisting

Question

All 3 statements below allow only alphabets, numbers, underscore & hyphen to pass through. Is there a difference between using re.sub , re.match & re.search below? ie is it possible to have a value for str where the execution paths of the `if statement below might be different for any of them?

str = 'some-random-string *&- '

if re.sub(r'[^a-zA-Z0-9_-]', '',  str) == str:
    #do stuff

if re.match(r'[a-zA-Z0-9_-]+$', str):
    #do stuff

if re.search(r'^[a-zA-Z0-9_-]+$', str):
    #do stuff

Answer 1

Using re.sub you get a new string and check it's not equal to what it was to detect if something was removed - that's not exactly performant.

Using re.search with the ^ to anchor the beginning of a match is the same as using re.match .

Using re.match is much more explicit of what you're trying to achieve, it has to match the pattern otherwise it's not valid - it can also shortcut early...

In short - stick with re.match for your purposes.

Python regex : Difference in usage of re.sub, re.match & re.search for whitelisting

Question

1 answers

solution1
1 ACCPTED 2014-08-27 16:42:51

Python regex : Difference in usage of re.sub, re.match & re.search for whitelisting

Question

1 answers

solution1 1 ACCPTED 2014-08-27 16:42:51

solution1
1 ACCPTED 2014-08-27 16:42:51