简体   繁体   English

如果行以关键字开头,则匹配数字

[英]match numbers if line starts with keyword

I've got a file that looks like this: 我有一个看起来像这样的文件:

foo: 11.00 12.00  bar 13.00
bar: 11.00 12.00 bar
foo: 11.00 12.00

and would like to extract all numbers in lines beginning with the keyword "foo:". 并希望提取以关键字“foo:”开头的行中的所有数字。 Expected result: 预期结果:

['11.00', '12.00', '13.00']
['11.00', '12.00']

Now, this is easy, if I use two regexes, like this: 现在,这很容易,如果我使用两个正则表达式,如下所示:

    if re.match('^foo:', line):
        re.findall('\d+\.\d+', line)

but I was wondering, if it is possible to combine these into a single regex? 但我想知道,是否有可能将这些组合成一个正则表达式?

Thanks for your help, MD 谢谢你的帮助,医学博士

Not exactly what you asked for, but since it's recommended to use standard Python tools instead of regexes where possible, I'd do something like this: 不完全是你要求的,但由于建议在可能的情况下使用标准的Python工具而不是正则表达式,我会做这样的事情:

import re

with open('numbers.txt', 'r') as f:
    [re.findall(r'\d+\.\d+', line) for line in f if line.startswith('foo')]

UPDATE UPDATE

And this will return the numbers after 'foo' even if it's anywhere in the string rather than just in the beginning: 这将返回'foo'之后的数字,即使它在字符串中的任何位置而不是在开头:

with open('numbers.txt', 'r') as f:
    [re.findall(r'\d+\.\d+', line.partition('foo')[2]) for line in f]

If all lines in the file always have the same number of numbers, you can use the following regex: 如果文件中的所有行始终具有相同的数字,则可以使用以下正则表达式:

"^foo:[^\d]*(\d*\.\d*)[^\d]*(\d*\.\d*)[^\d]*(\d*\.\d*)"

Example: 例:

>>> import re
>>> line = "foo: 11.00 12.00 bar 13.00"
>>> re.match("^foo:[^\d]*(\d*\.\d*)[^\d]*(\d*\.\d*)[^\d]*(\d*\.\d*)", line).groups()
('11.00', '12.00', '13.00')
>>> 

Using parentheses around a part of the regular expression makes it into a group that can be extracted from the match object. 在正则表达式的一部分周围使用括号使其成为可以从匹配对象中提取的组。 See the Python documentation for more information. 有关更多信息,请参阅Python文档。

You can do without the first regexp and instead filter lines in a list comprehension by comparing the first four characters of the line, and compile the inner regexp: 你可以不用第一个正则表达式,而是通过比较行的前四个字符来过滤列表理解中的行,并编译内部正则表达式:

import re

with open("input.txt", "r") as inp:
    prog=re.compile("\d+\.\d+")
    results=[prog.findall(line) for line in inp if line[:4]=="foo:"]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM