简体   繁体   English

使用 python 重新查找模式 substring

[英]find pattern substring using python re

I am trying to find all substrings within a multi string in python 3, I want to find all words in between the word 'Colour:':我正在尝试在 python 3 中的多个字符串中查找所有子字符串,我想查找单词“Colour:”之间的所有单词:

example string:示例字符串:

str = """
Colour: Black
Colour: Green
Colour: Black
Colour: Red
Colour: Orange
Colour: Blue
Colour: Green
"""

I want to get all of the colours into a list like:我想将所有颜色放入一个列表中,例如:

x = ['Black', 'Green', 'Black', 'Red', 'Orange', 'Blue', 'Green']

I want to do this using Python re我想使用 Python re 来做到这一点

Whats the fastest way of doing this with re.search, re.findall, re.finditer or even another method.使用 re.search、re.findall、re.finditer 或什至其他方法,最快的方法是什么。

I've tried doing this as a list comprehension:我试过这样做作为列表理解:

z = [x.group() for x in re.finditer('Colour:(.*?)Colour:', str)]

but it returns an empty list?但它返回一个空列表?

any ideas?有任何想法吗?

In regex, the dot .在正则表达式中,点. does not match new line by default.默认情况下不匹配新行。 This mean your program is trying to find something like "Color: blueColor".这意味着您的程序正在尝试查找类似“Color: blueColor”的内容。

To overcome this, you can just do something like:要克服这个问题,您可以执行以下操作:

colours = re.findall(r'Colour: (.+)', str)

Note the use of re.findall to avoid using the list comprehension.注意使用re.findall来避免使用列表理解。

Furthermore, if the format won't change, regex is not mandatory and you can just split each line on spaces and get the second part:此外,如果格式不会改变,则正则表达式不是强制性的,您可以将每一行拆分为空格并获得第二部分:

colours = [line.split()[1] for line in str.splitlines()]

The lists containing the trailing spaces can be removed and split based on the user-defined variable.可以根据用户定义的变量删除和拆分包含尾随空格的列表。 In your case, the Colour: .在您的情况下, Colour:

list(filter(None, str.replace("\n", "").replace(" ", "").split("Colour:")))

Result:结果:

['Black', 'Green', 'Black', 'Red', 'Orange', 'Blue', 'Green']
Regard to time constraints:关于时间限制:

Regex patterns are subjected to taking more time than dealing with strings directly.与直接处理字符串相比,正则表达式模式需要花费更多时间。

Adding the image for reference:添加图像以供参考: 在此处输入图像描述

Perhaps you just need a simple one-liner:也许您只需要一个简单的单行代码:

x = re.findall("Colour: (.*)",str)

This worked for your example.这适用于您的示例。

(PS please don't use builtin symbols like str for variable names.) (PS 请不要使用内置符号,如str作为变量名。)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM