I have a problem, I try to recognize a pattern among a list of words. I need to find a number of 1 to 6 digits with or without characters around.
my input is this: [1]: https://i.stack.imgur.com/RNOdL.png
With the OCR I obtained:
Kundennummer:
21924
The pattern r"(\D|\A)+ \d{5} (\D|\Z)+" works but when I change it to r"(\D|\A)+ \d{1,6} (\D|\Z)+" it doesn't.
I used re.match, re.findall and re.search and none of them works
the repr():
'Kundennummer:'
'21924'
Assuming you only need the first match:
import re
ocr_result = """
Kundennummer:
21924
"""
for result in re.findall(r'\d+', ocr_result):
if 1 <= len(result) <= 6:
break
else:
result = None
print(result)
Result:
21924
ocr_result1 = """
Kundennummer:
21924
"""
ocr_result2 = """
Kundennummer:3000
"""
for e in [ocr_result1, ocr_result2]:
print(re.findall(r'\w*\d{1,6}\w*', e))
['21924']
['3000']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.