简体   繁体   English

如何使用Python选择变量正则表达式?

[英]How do I select variable Regular expression using Python?

I have some lines like below with numbers and strings. 我下面有一些数字和字符串行。 Some have only numbers while some have some strings as well before them: 有些只有数字,有些则在它们前面也有一些字符串:

'abc'            (17245...64590)
'cde'            (12244...67730)
'dsa'            complement (12345...67890)

I would like to extract both formats with and without numbers. 我想提取带数字和不带数字的两种格式。 So, the first two lines should contain only numbers while the third line should also contain string before the numbers. 因此,前两行应仅包含数字,而第三行也应在数字之前包含字符串。

I am using this command to achieve this. 我正在使用此命令来实现这一目标。

result = re.findall("\bcomplement\b|\d+", line)

Any idea, how to do it. 任何想法,怎么做。 Expected output would be like this: 预期的输出将是这样的:

17245, 64590
12244, 67730
complement, 12345, 67890

If the number of digit chunks inside the parentheses is always 2 and they are separated with 1+ dots use 如果括号内的数字块的数量始终为2,并且用1+点分隔,请使用

re.findall(r'\s{2,}(?:(\w+)\s*)?\((\d+)\.+(\d+)\)', s)

See the regex demo . 参见regex演示 And a sample Python demo : 还有一个示例Python演示

import re
s= ''''abc'            (17245...64590)
'cde'            (12244...67730)
'dsa'            complement (12345...67890)'''
rx = r"\s{2,}(?:(\w+)\s*)?\((\d+)\.+(\d+)\)"
for x in re.findall(rx, s):
    print(", ".join([y for y in x if y]))

Details 细节

  • \\s{2,} - 2 or more whitespaces \\s{2,} -2个或更多空格
  • (?:(\\w+)\\s*)? - an optional sequence of: -可选的顺序:
    • (\\w+) - Group 1: one or more word chars (\\w+) -第1组:一个或多个单词字符
    • \\s* - 0+ whitespaces \\s* -0+空格
  • \\( - a ( \\( -一个(
  • (\\d+) - Group 2: one or more digits (\\d+) -第2组:一个或多个数字
  • \\.+ - 1 or more dots \\.+ -1个或多个点
  • (\\d+) - Group 3: one or more digits (\\d+) -第3组:一个或多个数字
  • \\) - a ) char. \\) -a )字符。

If the number of digit chunks inside the parentheses can vary you may use 如果括号内的数字块的数量可以变化,则可以使用

import re
s= ''''abc'            (17245...64590)
'cde'            (12244...67730)
'dsa'            complement (12345...67890)'''
for m in re.finditer(r'\s{2,}(?:(\w+)\s*)?\(([\d.]+)\)', s):
    res = []
    if m.group(1):
        res.append(m.group(1))
    res.extend(re.findall(r'\d+', m.group(2)))
    print(", ".join(res))

Both Python snippets output: 两个Python片段输出:

17245, 64590
12244, 67730
complement, 12345, 67890

See the online Python demo . 请参阅在线Python演示 Note it can match any number of digit chunks inside parentheses and it assumes that are at least 2 whitespace chars in between Column 1 and Column 2. 请注意,它可以匹配括号内的任意数量的数字块,并且假定在第1列和第2列之间至少有2个空格字符。

See the regex demo , too. 参见正则表达式演示 The difference with the first one is that there is no third group, the second and third groups are replaced with one second group ([\\d.]+) that captures 1 or more dots or digits (the digits are later extracted with re.findall(r'\\d+', m.group(2)) ). 与第一组的区别在于,没有第三组,第二组和第三组被捕获了一个或多个点或数字的第二组([\\d.]+)取代(这些数字随后使用re.findall(r'\\d+', m.group(2))提取) re.findall(r'\\d+', m.group(2)) )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM