Regex include only one digit between chars

Question

I have to parse a PDF document and I'm using PyPDF2 with re(regex).

The file includes several lines like the one below:

18-02-202010:44:48PEDMILANO OVEST- BINASCOA1,40

I need to extract from this line the text( bold ) between the time and the amount:

PEDMILANO OVEST- BINASCOA

The following code is working but sometimes this code doesn't find anything since can be a number between these chars, for example, 18-02-202010:44:48PEDMILANO OVE3ST- BINASCOA1,40 .

regex = re.compile(r'\d\d-\d\d-\d\d\d\d\d\d:\d\d:\d\d\D+\d+,\d\d')

Is there a way to include a number in this regular expression?

Answer 1

The following should simplify the current regex:

import re

s = '18-02-202010:44:48PEDMILANO OVE3ST- BINASCOA1,40'

re.search(r'\:\d+([A-Z].*?)(?=\d+\,\d+$)', s).group(1)
# 'PEDMILANO OVE3ST- BINASCOA'

See demo

\\d+([AZ].*?)(?=\\d+\\,\\d+$)
- \\ : matches the character : literally (case sensitive)
- \\d+ : matches a digit (equal to [0-9] )
- + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
- 1st Capturing Group ([AZ].*?) Match a single character present in the list below [AZ]
  - AZ a single character in the range between A (index 65) and Z (index 90) (case sensitive)
  - .*? matches any character (except for line terminators)
  - *? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
- Positive Lookahead (?=\\d+\\,\\d+$) Assert that the Regex below matches
  - \\d+ matches a digit (equal to [0-9] )
  - + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy) \\ , matches the character , literally (case sensitive)
- \\d+ matches a digit (equal to [0-9] )
- + Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
- $ asserts position at the end of a line

Answer 2

I suggest using

import re
text = "18-02-202010:44:48PEDMILANO OVEST- BINASCOA1,40"
print( re.sub(r'^\d{2}-\d{2}-\d{5,6}:\d{2}:\d{2}(.*?)\d+(?:,\d+)?$', r'\1', text) )

It can also be written as

re.sub(r'^\d{2}-\d{2}-\d{5,6}:\d{2}:\d{2}|\d+(?:,\d+)?$', '', text)

Or, if you prefer matching and capturing:

m = re.search(r'^\d{2}-\d{2}-\d{5,6}:\d{2}:\d{2}(.*?)\d+(?:,\d+)?$', text)
if m:
    print( m.group(1) )

See an online Python demo . With this solution, your data may start with any char, and will contain any char (excluding line break chars, since your data is on single lines).

Regex details

^ - start of string
\\d{2}-\\d{2}-\\d{5,6}:\\d{2}:\\d{2} - datetime string: two digits, - , two digits, - , five or six digits, : , two digits, : two digits
(.*?) - Group 1: any zero or more chars other than line break chars, as few as possible
\\d+(?:,\\d+)? - an int/float value pattern: 1+ digits followed with an optional sequence of , and 1+ digits
$ - end of string.

See the regex demo .

Regex include only one digit between chars

Question

2 answers

solution1
1 ACCPTED 2020-03-31 14:16:42

solution2
1 2020-03-31 14:22:01

Regex include only one digit between chars

Question

2 answers

solution1 1 ACCPTED 2020-03-31 14:16:42

solution2 1 2020-03-31 14:22:01

solution1
1 ACCPTED 2020-03-31 14:16:42

solution2
1 2020-03-31 14:22:01