简体   繁体   English

Python Regex:匹配短语,而不考虑中间空格

[英]Python Regex: Matching a phrase regardless of intermediate spaces

Given a phrase in a given line, I need to be able to match that phrase even if the words have a different number of spaces in the line. 给定行中的短语,即使单词在行中具有不同数量的空格,我也必须能够匹配该短语。

Thus, if the phrase is "the quick brown fox" and the line is "the quick brown fox jumped over the lazy dog" , the instance of "the quick brown fox" should still be matched. 因此,如果该短语是"the quick brown fox" ,而行是"the quick brown fox jumped over the lazy dog" ,则仍应匹配"the quick brown fox"的实例。

The method I already tried was to replace all instances of whitespace in the line with a regex pattern for whitespace, but this doesn't always work if the line contains characters that aren't treated as literal by regex. 我已经尝试过的方法是用正则表达式模式将行中的所有空白实例替换为空白,但是,如果该行包含的字符不被正则表达式视为文字,则这种方法并不总是有效。

This should work: 这应该工作:

import re

pattern = r'the\s+quick\s+brown\s+fox'
text = 'the           quick      brown        fox jumped over the lazy dog'

match = re.match(pattern, text)
print(match.group(0))

The output is: 输出为:

the           quick      brown        fox

You can use this regex. 您可以使用此正则表达式。 Check here 在这里检查

(the\s+quick\s+brown\s+fox)

You can split the given string by white spaces and join them back by a white space, so that you can then compare it to the phrase you're looking for: 您可以用空格将给定的字符串分割,再用空格将它们重新连接起来,以便随后将其与您要查找的短语进行比较:

s = "the           quick      brown        fox"
' '.join(s.split()) == "the quick brown fox" # returns True

for the general case: 对于一般情况:

  1. replace each sequence of space characters in only one space character. 用一个空格字符替换每个空格字符序列。
  2. check if the given sentence is sub string of the line after the replacement 检查给定的句子是否是替换后的行的子字符串

     import re pattern = "your pattern" for line in lines: line_without_spaces= re.sub(r'\\s+', ' ', line) # will replace multiple spaces with one space return pattern in line_without_spaces 

As your later clarified, you needed to match any line and series of words. 正如您稍后所阐明的,您需要匹配单词的任何行和系列。 To achieve this I added some more examples to clarify what the both proposed similar regexes do: 为了达到这个目的,我添加了更多示例来阐明两个提议的类似正则表达式的作用:

text = """the           quick      brown        fox
another line                    with single and multiple            spaces
some     other       instance     with        six                      words"""

Matching whole lines 匹配整条线

The first one matches the whole line, iterating over the single lines 第一个匹配整行,迭代单行

pattern1 = re.compile(r'((?:\w+)(?:\s+|$))+')
for i, line in enumerate(text.split('\n')):
    match = re.match(pattern1, line)
    print(i, match.group(0))

Its output is: 其输出为:

0 the           quick      brown        fox
1 another line                    with single and multiple            spaces
2 some     other       instance     with        six                      words

Matching whole lines 匹配整条线

The second one matches single words and iterates of them one-by-one while iterating over the single lines: 第二个匹配单个单词,并在单个行上迭代时一个接一个地迭代它们:

pattern2 = re.compile(r'(\w+)(?:\s+|$)')
for i, line in enumerate(text.split('\n')):
    for m in re.finditer(pattern2, line):
        print(m.group(1))
    print()

Its output is: 其输出为:

the
quick
brown
fox

another
line
with
single
and
multiple
spaces

some
other
instance
with
six
words

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM