I'm checking for a case-sensitive string pattern using Python 2.7 and it seems to return an incorrect match. I've run the following tests:
>>> import re
>>> rex_str = "^((BOA_[0-9]{4}-[0-9]{1,3})(?:CO)?.(?i)pdf$)"
>>> not re.match(rex_str, 'BOA_1988-148.pdf')
>>> False
>>> not re.match(rex_str, 'BOA_1988-148.PDF')
>>> False
>>> not re.match(rex_str, 'BOA1988-148.pdf')
>>> True
>>> not re.match(rex_str, 'boa_1988-148.pdf')
>>> False
The first three tests are correct, but the final test, 'boa_1988-148.pdf' should return True because the pattern is supposed to treat the first 3 characters (BOA) as case-sensitive.
I checked the expression with an online tester ( https://regex101.com/ ) and the pattern was correct, flagging the final as a no match because the 'boa' was lower case. Am I missing something or do you have to explicitly declare a group as case-sensitive using a case-sensitive mode like (?c)?
Flags do not apply to portions of a regex. You told the regex engine to match case insensitively:
(?i)
From the the syntax documentation :
(?aiLmsux)
(One or more letters from the set
'a'
,'i'
,'L'
,'m'
,'s'
,'u'
,'x'
.) The group matches the empty string; the letters set the corresponding flags:re.A
(ASCII-only matching),re.I
(ignore case),re.L
(locale dependent),re.M
(multi-line),re.S
(dot matches all), andre.X
(verbose), for the entire regular expression . (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a flag argument to there.compile()
function. Flags should be used first in the expression string.
Emphasis mine, the flag applies to the whole pattern , not just a substring. If you need to match just pdf
or PDF
, use that in your pattern directly:
r"^((BOA_[0-9]{4}-[0-9]{1,3})(?:CO)?.(?:pdf|PDF)$)"
This matches either .pdf
or .PDF
. If you need to match any mix of uppercase and lowercase, use:
r"^((BOA_[0-9]{4}-[0-9]{1,3})(?:CO)?.[pP][dD][fF]$)"
(?i)
doesn't only apply after itself or to the group that contains it. From the Python 2 re
documentation :
(?iLmsux)
(One or more letters from the set
'i'
,'L'
,'m'
,'s'
,'u'
,'x'
.) The group matches the empty string; the letters set the corresponding flags […] for the entire regular expression .
One option is to do it manually:
r"^(BOA_[0-9]{4}-[0-9]{1,3})(?:CO)?\.[Pp][Dd][Ff]\Z"
Another is to use a separate case-sensitive check:
rex_str = r"(?i)^(BOA_[0-9]{4}-[0-9]{1,3})(?:CO)?\.pdf\Z"
match = re.match(rex_str, s) if s.startswith("BOA_") else None
or separate case-insensitive one:
rex_str = r"^(BOA_[0-9]{4}-[0-9]{1,3})(?:CO)?\..{3}\Z"
match = re.match(rex_str, s) if s.lower().endswith(".pdf") else None
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.