I have some lines like below with numbers and strings. Some have only numbers while some have some strings as well before them:
'abc' (17245...64590)
'cde' (12244...67730)
'dsa' complement (12345...67890)
I would like to extract both formats with and without numbers. So, the first two lines should contain only numbers while the third line should also contain string before the numbers.
I am using this command to achieve this.
result = re.findall("\bcomplement\b|\d+", line)
Any idea, how to do it. Expected output would be like this:
17245, 64590
12244, 67730
complement, 12345, 67890
If the number of digit chunks inside the parentheses is always 2 and they are separated with 1+ dots use
re.findall(r'\s{2,}(?:(\w+)\s*)?\((\d+)\.+(\d+)\)', s)
See the regex demo . And a sample Python demo :
import re
s= ''''abc' (17245...64590)
'cde' (12244...67730)
'dsa' complement (12345...67890)'''
rx = r"\s{2,}(?:(\w+)\s*)?\((\d+)\.+(\d+)\)"
for x in re.findall(rx, s):
print(", ".join([y for y in x if y]))
Details
\\s{2,}
- 2 or more whitespaces (?:(\\w+)\\s*)?
- an optional sequence of:
(\\w+)
- Group 1: one or more word chars \\s*
- 0+ whitespaces \\(
- a (
(\\d+)
- Group 2: one or more digits \\.+
- 1 or more dots (\\d+)
- Group 3: one or more digits \\)
- a )
char. If the number of digit chunks inside the parentheses can vary you may use
import re
s= ''''abc' (17245...64590)
'cde' (12244...67730)
'dsa' complement (12345...67890)'''
for m in re.finditer(r'\s{2,}(?:(\w+)\s*)?\(([\d.]+)\)', s):
res = []
if m.group(1):
res.append(m.group(1))
res.extend(re.findall(r'\d+', m.group(2)))
print(", ".join(res))
Both Python snippets output:
17245, 64590
12244, 67730
complement, 12345, 67890
See the online Python demo . Note it can match any number of digit chunks inside parentheses and it assumes that are at least 2 whitespace chars in between Column 1 and Column 2.
See the regex demo , too. The difference with the first one is that there is no third group, the second and third groups are replaced with one second group ([\\d.]+)
that captures 1 or more dots or digits (the digits are later extracted with re.findall(r'\\d+', m.group(2))
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.