简体   繁体   中英

Do not match if word appears in regex

I have a url, and I want it to NOT match if the word 'season' is contained in the url. Here are two examples:

CONTAINS SEASON, DO NOT MATCH
'http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7'

DOES NOT CONTAIN SEASON, MATCH
'http://imdb.com/title/tt0285331/

Here is what I have so far, but I'm afraid the .+ will match everything until the end. What would be the correct regex to use here?

r'http://imdb.com/title/tt(\d)+/.+^[season].+'

Use a negative lookahead:

urls='''\
http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
http://imdb.com/title/tt0285331/'''

import re

print re.findall(r'^(?!.*\bseason\b)(.*)', urls, re.M)
# ['http://imdb.com/title/tt0285331/']

You cannot use whole words inside of character classes, you have to use a Negative Lookahead .

>>> s = '''
http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
http://imdb.com/title/tt0285331/
http://imdb.com/title/tt1111111/episodes?this=2
http://imdb.com/title/tt0123456/episodes?this=1&season=1&ref_=tt_eps_sn_1'''
>>> import re
>>> re.findall(r'\bhttp://imdb.com/title/tt(?!\S+\bseason)\S+', s)
# ['http://imdb.com/title/tt0285331/', 'http://imdb.com/title/tt0285331/episodes?this=2']

Use a negative lokahead just after to tt\\d+/ ,

>>> import re
>>> s = """http://imdb.com/title/tt0285331/episodes?this=1&season=7&ref_=tt_eps_sn_7
... http://imdb.com/title/tt0285331/
... """
>>> m = re.findall(r'^http://imdb.com/title/tt\d+/(?:(?!season).)*$', s, re.M)
>>> for i in m:
...     print i
... 
http://imdb.com/title/tt0285331/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM