简体   繁体   中英

regular expression to met length firstly then check other pattern?

I don't know how to express the question in English title exactly, there are 2 rules

  1. firstly, met the given length as long as possible on the head and end part

  2. then match the other pattern

for example,

  1. must read 2~3 chars before number and must read 2~4 chars after the number if the string is long enough; if the string is not long enough, read only possible
  2. check whether the char before number is not a , and the char after number is not z

--- edit on 20220620 ---- the code is what exactly the following table tried to express

import re
lst = {
'abc123defg':'abc123defg',
'babc123defg':'abc123defg',
'aba123defg':'""',
'abc123zefg':'""',
'bc123def':'bc123def',
'c123def':'c123def',
'c123zef':'""',
'c123d':'c123d'
}

reStr = r".{1,2}[^a\d]\d+[^z\d].{1,3}"
reStr = r"^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}"

for key, value in lst.items():
    match = re.match(reStr, key, re.IGNORECASE | re.VERBOSE)
    if match:
        print(f'{key:15s} expected to be: {value:15s}, really get: {match.group():15s}')
    else:
        print(f'{key:15s} expected to be: {value:15s}, really get: ""')

--- the following description is the old one which I did not edit it now

text expected find explanation
abc123defg abc123defg first read in 'abc123defg', in which c does not break [^a] , and d does not break [^z] . so 'abc123defg' is matched
babc123defg abc123defg first read in 'abc123defg', in which c does not break [^a] , and d does not break [^z] . so 'abc123defg' is matched
aba123defg nothing first read in 'abc123defg', in which a breaks [^a] , and d does not break [^z] . so '' is matched
abc123zefg nothing first read in 'abc123defg', in which c does not break [^a] , but z does break [^z] . so '' is matched
bc123def bc123def first read in 'bc123def', in which c does not break [^a] , and d does not break [^z] . so 'bc123def' is matched
c123def c123def first read in 'c123def', in which c does not break [^a] , and d does not break [^z] . so 'c123def' is matched
c123zef nothing first read in 'c123def', in which c does not break [^a] , and z does break [^z] . so '' is matched
c123d c123d first read in 'c123d', in which c does not break [^a] , and d does not break [^z] . so 'c123d' is matched

so I write the regular expression in Python

import re
lst = ['abc123defg', 'aba123defg', 'abc123zefg', 'bc123def']

for text in lst:
    print(text, ' -> ', re.match(r".{1,2}[^a]\d*[^z].{1,3}", text, re.IGNORECASE | re.VERBOSE).group())

but, of cause, the answer is not expected

abc123defg  ->  abc123defg
aba123defg  ->  aba123
abc123zefg  ->  abc123zef
bc123def  ->  bc123def

So is there a way to meet the expectation with just regular expression? Thanks

Based on your description, I changed the regexp to:

r".{1,2}[^a\d]\d+[^z\d].{1,3}"

The point is to match the number correctly:

  • at least one digit: \d+ instead of \d*
  • in order to exactly know what is before and after the number, you need to match the number only by the above mentioned \d+ . That's why I "added" [^\d] before and after.

The 2 rules can be encompassed in a single regex pattern. Try with this expression:

regex = re.compile("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}")

Simple and self-explanatory:

  • Find 1 to 2 alphabetic characters
  • Find one more alphabetic character, excluding chars A | a
  • Continue with any digits sequence
  • After that, find another alphabetic character, excluding chars Z | z
  • Find 1 to 3 more alphabetic characters

Then you will be able to reproduce your code with the expected results.

A try clause will be needed in order to avoid None related AttributeErrors, for the cases where the pattern is not matched:

for text in lst:
try:
    print(text, ' -> ', re.match("^[A-Za-z]{1,2}[B-Zb-z]\d+[A-Ya-y][A-Za-z]{1,3}", text).group())
except AttributeError:
    print(text, ' -> No pattern found')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM