How do I select variable Regular expression using Python?

Question

I have some lines like below with numbers and strings. Some have only numbers while some have some strings as well before them:

'abc'            (17245...64590)
'cde'            (12244...67730)
'dsa'            complement (12345...67890)

I would like to extract both formats with and without numbers. So, the first two lines should contain only numbers while the third line should also contain string before the numbers.

I am using this command to achieve this.

result = re.findall("\bcomplement\b|\d+", line)

Any idea, how to do it. Expected output would be like this:

17245, 64590
12244, 67730
complement, 12345, 67890

Answer 1

If the number of digit chunks inside the parentheses is always 2 and they are separated with 1+ dots use

re.findall(r'\s{2,}(?:(\w+)\s*)?\((\d+)\.+(\d+)\)', s)

See the regex demo . And a sample Python demo :

import re
s= ''''abc'            (17245...64590)
'cde'            (12244...67730)
'dsa'            complement (12345...67890)'''
rx = r"\s{2,}(?:(\w+)\s*)?\((\d+)\.+(\d+)\)"
for x in re.findall(rx, s):
    print(", ".join([y for y in x if y]))

Details

\\s{2,} - 2 or more whitespaces
(?:(\\w+)\\s*)? - an optional sequence of:
- (\\w+) - Group 1: one or more word chars
- \\s* - 0+ whitespaces
\\( - a (
(\\d+) - Group 2: one or more digits
\\.+ - 1 or more dots
(\\d+) - Group 3: one or more digits
\\) - a ) char.

If the number of digit chunks inside the parentheses can vary you may use

import re
s= ''''abc'            (17245...64590)
'cde'            (12244...67730)
'dsa'            complement (12345...67890)'''
for m in re.finditer(r'\s{2,}(?:(\w+)\s*)?\(([\d.]+)\)', s):
    res = []
    if m.group(1):
        res.append(m.group(1))
    res.extend(re.findall(r'\d+', m.group(2)))
    print(", ".join(res))

Both Python snippets output:

17245, 64590
12244, 67730
complement, 12345, 67890

See the online Python demo . Note it can match any number of digit chunks inside parentheses and it assumes that are at least 2 whitespace chars in between Column 1 and Column 2.

See the regex demo , too. The difference with the first one is that there is no third group, the second and third groups are replaced with one second group ([\\d.]+) that captures 1 or more dots or digits (the digits are later extracted with re.findall(r'\\d+', m.group(2)) ).

How do I select variable Regular expression using Python?

Question

1 answers

solution1
0 2017-10-06 21:23:58

How do I select variable Regular expression using Python?

Question

1 answers

solution1 0 2017-10-06 21:23:58

solution1
0 2017-10-06 21:23:58