regular expression to met length firstly then check other pattern?

Question

I don't know how to express the question in English title exactly, there are 2 rules

firstly, met the given length as long as possible on the head and end part
then match the other pattern

for example,

must read 2~3 chars before number and must read 2~4 chars after the number if the string is long enough; if the string is not long enough, read only possible
check whether the char before number is not a , and the char after number is not z

--- edit on 20220620 ---- the code is what exactly the following table tried to express

import re
lst = {
'abc123defg':'abc123defg',
'babc123defg':'abc123defg',
'aba123defg':'""',
'abc123zefg':'""',
'bc123def':'bc123def',
'c123def':'c123def',
'c123zef':'""',
'c123d':'c123d'
}

reStr = r".{1,2}[^a\d]\d+[^z\d].{1,3}"
reStr = r"^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}"

for key, value in lst.items():
    match = re.match(reStr, key, re.IGNORECASE | re.VERBOSE)
    if match:
        print(f'{key:15s} expected to be: {value:15s}, really get: {match.group():15s}')
    else:
        print(f'{key:15s} expected to be: {value:15s}, really get: ""')

--- the following description is the old one which I did not edit it now

text	expected find	explanation
abc123defg	abc123defg	first read in 'abc123defg', in which `c` does not break `[^a]` , and `d` does not break `[^z]` . so 'abc123defg' is matched
babc123defg	abc123defg	first read in 'abc123defg', in which `c` does not break `[^a]` , and `d` does not break `[^z]` . so 'abc123defg' is matched
aba123defg	nothing	first read in 'abc123defg', in which `a` breaks `[^a]` , and `d` does not break `[^z]` . so `''` is matched
abc123zefg	nothing	first read in 'abc123defg', in which `c` does not break `[^a]` , but `z` does break `[^z]` . so `''` is matched
bc123def	bc123def	first read in 'bc123def', in which `c` does not break `[^a]` , and `d` does not break `[^z]` . so 'bc123def' is matched
c123def	c123def	first read in 'c123def', in which `c` does not break `[^a]` , and `d` does not break `[^z]` . so 'c123def' is matched
c123zef	nothing	first read in 'c123def', in which `c` does not break `[^a]` , and `z` does break `[^z]` . so '' is matched
c123d	c123d	first read in 'c123d', in which `c` does not break `[^a]` , and `d` does not break `[^z]` . so 'c123d' is matched

so I write the regular expression in Python

import re
lst = ['abc123defg', 'aba123defg', 'abc123zefg', 'bc123def']

for text in lst:
    print(text, ' -> ', re.match(r".{1,2}[^a]\d*[^z].{1,3}", text, re.IGNORECASE | re.VERBOSE).group())

but, of cause, the answer is not expected

abc123defg  ->  abc123defg
aba123defg  ->  aba123
abc123zefg  ->  abc123zef
bc123def  ->  bc123def

So is there a way to meet the expectation with just regular expression? Thanks

Answer 1

Based on your description, I changed the regexp to:

r".{1,2}[^a\d]\d+[^z\d].{1,3}"

The point is to match the number correctly:

at least one digit: \d+ instead of \d*
in order to exactly know what is before and after the number, you need to match the number only by the above mentioned \d+ . That's why I "added" [^\d] before and after.

Answer 2

The 2 rules can be encompassed in a single regex pattern. Try with this expression:

regex = re.compile("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}")

Simple and self-explanatory:

Find 1 to 2 alphabetic characters
Find one more alphabetic character, excluding chars A | a
Continue with any digits sequence
After that, find another alphabetic character, excluding chars Z | z
Find 1 to 3 more alphabetic characters

Then you will be able to reproduce your code with the expected results.

A try clause will be needed in order to avoid None related AttributeErrors, for the cases where the pattern is not matched:

for text in lst:
try:
    print(text, ' -> ', re.match("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}", text).group())
except AttributeError:
    print(text, ' -> No pattern found')

regular expression to met length firstly then check other pattern?

Question

2 answers

solution1
0 2022-06-19 13:57:50

solution2
0 2022-06-19 14:03:20

regular expression to met length firstly then check other pattern?

Question

2 answers

solution1 0 2022-06-19 13:57:50

solution2 0 2022-06-19 14:03:20

solution1
0 2022-06-19 13:57:50

solution2
0 2022-06-19 14:03:20