简体   繁体   English

如何在Python迭代器中匹配对象模式?

[英]How do I match a pattern of objects in a Python iterator?

It is very easy to use the Python re functions to match and manipulate patterns in text, for example: 使用Python re函数来匹配和操纵文本模式非常容易,例如:

re.match('a[efg]*c', 'aggggc')

How do I do the same thing on a list or other python iterator? 如何在列表或其他python迭代器上执行相同的操作? For example, I may have a list that looks like this: 例如,我可能有一个看起来像这样的列表:

>>> list = ['foo', 'bar', 3, (1, 2, 3), 'a', 'b', {5, 6, 7}, 'apple']

And, following the regex idiom, I may want to match a pattern like so: 而且,按照正则表达式的习惯用法,我可能想要匹配一个这样的模式:

>>> pattern = ['a', '[', {7, 6, 5}, 'b', 'c', ']', '*', 'apple']

and I want to find a match inside this list. 我想在此列表中找到一个匹配项。 If it were regex, I'd write it like this: 如果是正则表达式,我会这样写:

>>> match = re.search(pattern, list)
>>> match.group(0)
['a', 'b', {5, 6, 7}, 'apple']

But, of course, it doesn't work because Python regex expects to see a string. 但是,当然,它不起作用,因为Python正则表达式期望看到一个字符串。

How do I do this? 我该怎么做呢?

Note: it's the ability to match patterns that I'm looking for, not this exact syntax. 注意:这是匹配我要寻找的模式的能力,而不是这种确切的语法。 I guess, the ideal answer would be a module or library (or succinct function) that provided a variety of regex style pattern matching tools that worked on lists. 我猜,理想的答案应该是模块或库(或简洁的函数),该模块或库提供可用于列表的各种正则表达式样式模式匹配工具。

Explanation for why I want this: I'm working on scripts to process text from SE-Asian languages which use complex scripts. 我为什么想要的解释:我正在处理脚本,以处理使用复杂脚本的东南亚语言的文本。 The program I'm working on now will intelligently correct typing mistakes (this language has characters which can go above, below, in front, around, etc., and have specific rules about which order they can occur in). 我现在正在使用的程序将智能地纠正键入错误(该语言的字符可以在上方,下方,前面,周围等显示,并具有关于它们出现的顺序的特定规则)。 The fist pass of my program uses a state machine to assign each character to a class, such as consonants, vowel, tone, number, etc. The second pass will try to correct invalid syllables and other kinds of mistakes. 我程序的第一遍使用状态机将每个字符分配给一个类,例如辅音,元音,音调,数字等。第二遍将尝试纠正无效的音节和其他类型的错误。 There's no analogy in English as far as the syllable bit goes, but in the numbers, suppose I saw the pattern ['number', 'o', 'number'] , then I would presume that the typist meant 'zero' rather than 'oh' and make the proper corrections. 就音节位而言,英语没有比喻,但是在数字中,假设我看到了模式['number', 'o', 'number'] ,那么我认为打字员的意思是“零”而不是“哦”,并进行适当的更正。

You can do something like this and check if the item is a str before trying to match it. 您可以执行以下操作,并在尝试匹配该项目之前检查该项目是否为str

import re
from collections import Iterable

pattern = re.compile('a[efg]*')
items = ['foo', 'bar', 3, (1, 2, 3), 'a', 'b', {5, 6, 7}, 'apple']

def _find_matches(it, pattern):
    matches = []
    for i in it:
        if isinstance(i, str):
            m = pattern.match(i)
            if m:
                matches.append(m)
        elif isinstance(i, Iterable):
            m = _find_matches(i, pattern)
            matches.extend(m)
        else:
            print "Could not process: {}".format(i)
    return matches

results = _find_matches(items, pattern)

Mostly you would need to write a function for checking this. 通常,您需要编写一个函数来检查这一点。 some thing like this. 这样的事情。

import sys


my_list =  ['foo', 'bar', 3, (1, 2, 3), 'a', 'b', {5, 6, 7}, 'apple']
pattern = ['fo', 'bar', 3, (1, 2, 3), 'a', '*', {5, 6, 7}, 'apple']


if len(my_list) != len(pattern):
    print('List length dose not match with the pattern')
    sys.exit(1)

for offset,value in enumerate(my_list):
    if pattern[offset] != value and pattern[offset] != '*':
        print('Pattern matching failed at offset {} with value {}'.format(offset, my_list[offset]))
        break;
else:
    print('Pattern matched perfectly..');

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM