How to find a specific pattern into a list

Question

I'm trying to create a pdf reader in python, I already got the pdf read and

I got a list with the content of the pdf and I want now to give me back the numbers with eleven characters, like 123.456.789-33 or 124.323.432.33

from PyPDF2 import PdfReader
import re
reader = PdfReader(r"\\abcdacd.pdf")

number_of_pages = len(reader.pages)
page = reader.pages[0]
text = page.extract_text()
num = re.findall(r'\d+', text)
print(num)

here's the output:

['01', '01', '2000', '26', '12', '2022', '04483203983', '044', '832', '039', '83', '20210002691450', '5034692', '79', '2020', '8', '24', '0038', '1', '670', '03', '2', '14', '2', '14', '1', '670', '03', '2', '14', '2', '14', '1', '1', '8', '21', '1']

If someone could help me, I'll be really thankful.

Answer 1

Change regex pattern to the following (to match groups of digits):

s = 'text text 123.456.789-33 or 124.323.432.33 text or 12323112333 or even 123,231,123,33 '
num = re.findall(r'\d{3}[.,]?\d{3}[.,]?\d{3}[.,-]?\d{2}', s)
print(num)

['123.456.789-33', '124.323.432.33', '12323112333', '123,231,123,33']

Answer 2

You can try:

\b(?:\d[.-]*){11}\b

Regex demo.

import re

s = '''\
123.456.789-33
124.323.432.33
111-2-3-4-5-6-7-8-9'''

pat = re.compile(r'\b(?:\d[.-]*){11}\b')
for m in pat.findall(s):
    print(m)

Prints:

123.456.789-33
124.323.432.33
111-2-3-4-5-6-7-8-9

How to find a specific pattern into a list

Question

2 answers

solution1
1 ACCPTED 2023-01-03 12:35:09

solution2
0 2023-01-03 13:46:43

How to find a specific pattern into a list

Question

2 answers

solution1 1 ACCPTED 2023-01-03 12:35:09

solution2 0 2023-01-03 13:46:43

solution1
1 ACCPTED 2023-01-03 12:35:09

solution2
0 2023-01-03 13:46:43