简体   繁体   English

grep多层迭代匹配的字符串(Python)

[英]Grep multi-layered iterable for strings that match (Python)

Say that we have a multilayered iterable with some strings at the "final" level, yes strings are iterable, but I think that you get my meaning: 假设我们有一个多层迭代,在“最终”级别有一些字符串,是字符串是可迭代的,但我认为你得到了我的意思:

['something', 
('Diff',
('diff', 'udiff'),
('*.diff', '*.patch'),
('text/x-diff', 'text/x-patch')),

('Delphi',
('delphi', 'pas', 'pascal', 'objectpascal'),
('*.pas',),
('text/x-pascal',['lets', 'put one here'], )),

('JavaScript+Mako',
('js+mako', 'javascript+mako'),
('application/x-javascript+mako',
'text/x-javascript+mako',
'text/javascript+mako')),
...
]

Is there any convenient way that I could implement a search that would give me the indices of the matching strings? 有没有方便的方法可以实现一个搜索,它会给我匹配字符串的索引? I would like something that would act something like this (where the above list is data ): 我想要一些像这样的东西(上面的列表是data ):

>>> grep('javascript', data)

and it would return [ (2,1,1), (2,2,0), (2,2,1), (2,2,2) ] perhaps. 它可能会返回[(2,1,1),(2,2,0),(2,2,1),(2,2,2)]。 Maybe I'm missing a comparable solution that returns nothing of the sort but can help me find some strings within a multi-layered list of iterables of iterables of .... strings. 也许我错过了一个类似的解决方案,它不返回任何类型,但可以帮助我在....字符串的迭代的迭代的多层列表中找到一些字符串。

I wrote a little bit but it was seeming juvenile and inelegant so I thought I would ask here. 我写了一点点,但它看起来少年而且不雅,所以我想我会在这里问。 I guess that I could just keep nesting the exception the way I started here to the number of levels that the function would then support, but I was hoping to get something neat, abstract, pythonic. 我想我可以按照我从这里开始的方式将异常嵌套到函数将支持的级别数,但我希望能得到一些整洁,抽象,pythonic的东西。

import re

def rgrep(s, data):
    ''' given a iterable of strings or an iterable of iterables of strings,

    returns the index/indices of strings that contain the search string.

    Args::

        s - the string that you are searching for
        data - the iterable of strings or iterable of iterables of strings
    '''


    results = []
    expr = re.compile(s)
    for item in data:
        try:
            match = expr.search(item)
            if match != None:
                results.append( data.index(item) )

        except TypeError:
            for t in item:
                try:
                    m = expr.search(t)
                    if m != None:
                        results.append( (list.index(item), item.index(t)) )

                except TypeError:
                    ''' you can only go 2 deep! '''
                    pass

    return results

I'd split recursive enumeration from grepping: 我从grepping中拆分了递归枚举:

def enumerate_recursive(iter, base=()):
    for index, item in enumerate(iter):
        if isinstance(item, basestring):
            yield (base + (index,)), item
        else:
            for pair in enumerate_recursive(item, (base + (index,))):
                yield pair

def grep_index(filt, iter):
    return (index for index, text in iter if filt in text)

This way you can do both non-recursive and recursive grepping: 这样你就可以做非递归和递归的grepping:

l = list(grep_index('opt1', enumerate(sys.argv)))   # non-recursive
r = list(grep_index('diff', enumerate_recursive(your_data)))  # recursive

Also note that we're using iterators here, saving RAM for longer sequences if necessary. 另请注意,我们在这里使用迭代器,如果需要,可以为更长的序列保存RAM。

Even more generic solution would be to give a callable instead of string to grep_index. 更通用的解决方案是给grep_index一个可调用而不是字符串。 But that might not be necessary for you. 但这对你来说可能没有必要。

Here is a grep that uses recursion to search the data structure. 这是一个使用递归来搜索数据结构的grep。

Note that good data structures lead the way to elegant solutions. 请注意,良好的数据结构可以引领优雅的解决方案。 Bad data structures make you bend over backwards to accomodate. 糟糕的数据结构会让您向后弯腰以适应。 This feels to me like one of those cases where a bad data structure is obstructing rather than helping you. 这对我来说就像是一个糟糕的数据结构阻碍而不是帮助你的情况之一。

Having a simple data structure with a more uniform structure (instead of using this grep) might be worth investigating. 拥有一个结构更统一的简单数据结构(而不​​是使用这个grep)可能值得研究。

#!/usr/bin/env python

data=['something', 
('Diff',
('diff', 'udiff'),
('*.diff', '*.patch'),
('text/x-diff', 'text/x-patch',['find','java deep','down'])),

('Delphi',
('delphi', 'pas', 'pascal', 'objectpascal'),
('*.pas',),
('text/x-pascal',['lets', 'put one here'], )),

('JavaScript+Mako',
('js+mako', 'javascript+mako'),
('application/x-javascript+mako',
'text/x-javascript+mako',
'text/javascript+mako')),
]

def grep(astr,data,prefix=[]):
    result=[]
    for idx,elt in enumerate(data):
        if isinstance(elt,basestring):
            if astr in elt:
                result.append(tuple(prefix+[idx]))
        else:
            result.extend(grep(astr,elt,prefix+[idx]))
    return result

def pick(data,idx):
    if idx:
        return pick(data[idx[0]],idx[1:])
    else:
        return data
idxs=grep('java',data)
print(idxs)
for idx in idxs:
    print('data[%s] = %s'%(idx,pick(data,idx)))

To get the position use enumerate() 获取位置使用enumerate()

>>> data = [('foo', 'bar', 'frrr', 'baz'), ('foo/bar', 'baz/foo')]
>>> 
>>> for l1, v1 in enumerate(data):
...     for l2, v2 in enumerate(v1):
...             if 'f' in v2:
...                     print l1, l2, v2
... 
0 0 foo
1 0 foo/bar
1 1 baz/foo

In this example I am using a simple match 'foo' in bar yet you probably use regex for the job. 在这个例子中,我'foo' in bar使用了一个简单的匹配'foo' in bar但你可能使用正则表达式来完成这项工作。

Obviously, enumerate() can provide support in more than 2 levels as in your edited post. 显然, enumerate()可以在您编辑的帖子中提供超过2个级别的支持。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM