I don't know how to express the question in English title exactly, there are 2 rules
firstly, met the given length as long as possible on the head and end part
then match the other pattern
for example,
a
, and the char after number is not z
--- edit on 20220620 ---- the code is what exactly the following table tried to express
import re
lst = {
'abc123defg':'abc123defg',
'babc123defg':'abc123defg',
'aba123defg':'""',
'abc123zefg':'""',
'bc123def':'bc123def',
'c123def':'c123def',
'c123zef':'""',
'c123d':'c123d'
}
reStr = r".{1,2}[^a\d]\d+[^z\d].{1,3}"
reStr = r"^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}"
for key, value in lst.items():
match = re.match(reStr, key, re.IGNORECASE | re.VERBOSE)
if match:
print(f'{key:15s} expected to be: {value:15s}, really get: {match.group():15s}')
else:
print(f'{key:15s} expected to be: {value:15s}, really get: ""')
--- the following description is the old one which I did not edit it now
text | expected find | explanation |
---|---|---|
abc123defg | abc123defg | first read in 'abc123defg', in which c does not break [^a] , and d does not break [^z] . so 'abc123defg' is matched |
babc123defg | abc123defg | first read in 'abc123defg', in which c does not break [^a] , and d does not break [^z] . so 'abc123defg' is matched |
aba123defg | nothing | first read in 'abc123defg', in which a breaks [^a] , and d does not break [^z] . so '' is matched |
abc123zefg | nothing | first read in 'abc123defg', in which c does not break [^a] , but z does break [^z] . so '' is matched |
bc123def | bc123def | first read in 'bc123def', in which c does not break [^a] , and d does not break [^z] . so 'bc123def' is matched |
c123def | c123def | first read in 'c123def', in which c does not break [^a] , and d does not break [^z] . so 'c123def' is matched |
c123zef | nothing | first read in 'c123def', in which c does not break [^a] , and z does break [^z] . so '' is matched |
c123d | c123d | first read in 'c123d', in which c does not break [^a] , and d does not break [^z] . so 'c123d' is matched |
so I write the regular expression in Python
import re
lst = ['abc123defg', 'aba123defg', 'abc123zefg', 'bc123def']
for text in lst:
print(text, ' -> ', re.match(r".{1,2}[^a]\d*[^z].{1,3}", text, re.IGNORECASE | re.VERBOSE).group())
but, of cause, the answer is not expected
abc123defg -> abc123defg
aba123defg -> aba123
abc123zefg -> abc123zef
bc123def -> bc123def
So is there a way to meet the expectation with just regular expression? Thanks
Based on your description, I changed the regexp to:
r".{1,2}[^a\d]\d+[^z\d].{1,3}"
The point is to match the number correctly:
\d+
instead of \d*
\d+
. That's why I "added" [^\d]
before and after.The 2 rules can be encompassed in a single regex pattern. Try with this expression:
regex = re.compile("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}")
Simple and self-explanatory:
Then you will be able to reproduce your code with the expected results.
A try
clause will be needed in order to avoid None related AttributeErrors, for the cases where the pattern is not matched:
for text in lst:
try:
print(text, ' -> ', re.match("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}", text).group())
except AttributeError:
print(text, ' -> No pattern found')
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.