[英]Grep multi-layered iterable for strings that match (Python)
Say that we have a multilayered iterable with some strings at the "final" level, yes strings are iterable, but I think that you get my meaning: 假设我们有一个多层迭代,在“最终”级别有一些字符串,是字符串是可迭代的,但我认为你得到了我的意思:
['something',
('Diff',
('diff', 'udiff'),
('*.diff', '*.patch'),
('text/x-diff', 'text/x-patch')),
('Delphi',
('delphi', 'pas', 'pascal', 'objectpascal'),
('*.pas',),
('text/x-pascal',['lets', 'put one here'], )),
('JavaScript+Mako',
('js+mako', 'javascript+mako'),
('application/x-javascript+mako',
'text/x-javascript+mako',
'text/javascript+mako')),
...
]
Is there any convenient way that I could implement a search that would give me the indices of the matching strings? 有没有方便的方法可以实现一个搜索,它会给我匹配字符串的索引? I would like something that would act something like this (where the above list is
data
): 我想要一些像这样的东西(上面的列表是
data
):
>>> grep('javascript', data)
and it would return [ (2,1,1), (2,2,0), (2,2,1), (2,2,2) ] perhaps. 它可能会返回[(2,1,1),(2,2,0),(2,2,1),(2,2,2)]。 Maybe I'm missing a comparable solution that returns nothing of the sort but can help me find some strings within a multi-layered list of iterables of iterables of .... strings.
也许我错过了一个类似的解决方案,它不返回任何类型,但可以帮助我在....字符串的迭代的迭代的多层列表中找到一些字符串。
I wrote a little bit but it was seeming juvenile and inelegant so I thought I would ask here. 我写了一点点,但它看起来少年而且不雅,所以我想我会在这里问。 I guess that I could just keep nesting the exception the way I started here to the number of levels that the function would then support, but I was hoping to get something neat, abstract, pythonic.
我想我可以按照我从这里开始的方式将异常嵌套到函数将支持的级别数,但我希望能得到一些整洁,抽象,pythonic的东西。
import re
def rgrep(s, data):
''' given a iterable of strings or an iterable of iterables of strings,
returns the index/indices of strings that contain the search string.
Args::
s - the string that you are searching for
data - the iterable of strings or iterable of iterables of strings
'''
results = []
expr = re.compile(s)
for item in data:
try:
match = expr.search(item)
if match != None:
results.append( data.index(item) )
except TypeError:
for t in item:
try:
m = expr.search(t)
if m != None:
results.append( (list.index(item), item.index(t)) )
except TypeError:
''' you can only go 2 deep! '''
pass
return results
I'd split recursive enumeration from grepping: 我从grepping中拆分了递归枚举:
def enumerate_recursive(iter, base=()):
for index, item in enumerate(iter):
if isinstance(item, basestring):
yield (base + (index,)), item
else:
for pair in enumerate_recursive(item, (base + (index,))):
yield pair
def grep_index(filt, iter):
return (index for index, text in iter if filt in text)
This way you can do both non-recursive and recursive grepping: 这样你就可以做非递归和递归的grepping:
l = list(grep_index('opt1', enumerate(sys.argv))) # non-recursive
r = list(grep_index('diff', enumerate_recursive(your_data))) # recursive
Also note that we're using iterators here, saving RAM for longer sequences if necessary. 另请注意,我们在这里使用迭代器,如果需要,可以为更长的序列保存RAM。
Even more generic solution would be to give a callable instead of string to grep_index. 更通用的解决方案是给grep_index一个可调用而不是字符串。 But that might not be necessary for you.
但这对你来说可能没有必要。
Here is a grep that uses recursion to search the data structure. 这是一个使用递归来搜索数据结构的grep。
Note that good data structures lead the way to elegant solutions. 请注意,良好的数据结构可以引领优雅的解决方案。 Bad data structures make you bend over backwards to accomodate.
糟糕的数据结构会让您向后弯腰以适应。 This feels to me like one of those cases where a bad data structure is obstructing rather than helping you.
这对我来说就像是一个糟糕的数据结构阻碍而不是帮助你的情况之一。
Having a simple data structure with a more uniform structure (instead of using this grep) might be worth investigating. 拥有一个结构更统一的简单数据结构(而不是使用这个grep)可能值得研究。
#!/usr/bin/env python
data=['something',
('Diff',
('diff', 'udiff'),
('*.diff', '*.patch'),
('text/x-diff', 'text/x-patch',['find','java deep','down'])),
('Delphi',
('delphi', 'pas', 'pascal', 'objectpascal'),
('*.pas',),
('text/x-pascal',['lets', 'put one here'], )),
('JavaScript+Mako',
('js+mako', 'javascript+mako'),
('application/x-javascript+mako',
'text/x-javascript+mako',
'text/javascript+mako')),
]
def grep(astr,data,prefix=[]):
result=[]
for idx,elt in enumerate(data):
if isinstance(elt,basestring):
if astr in elt:
result.append(tuple(prefix+[idx]))
else:
result.extend(grep(astr,elt,prefix+[idx]))
return result
def pick(data,idx):
if idx:
return pick(data[idx[0]],idx[1:])
else:
return data
idxs=grep('java',data)
print(idxs)
for idx in idxs:
print('data[%s] = %s'%(idx,pick(data,idx)))
To get the position use enumerate()
获取位置使用
enumerate()
>>> data = [('foo', 'bar', 'frrr', 'baz'), ('foo/bar', 'baz/foo')]
>>>
>>> for l1, v1 in enumerate(data):
... for l2, v2 in enumerate(v1):
... if 'f' in v2:
... print l1, l2, v2
...
0 0 foo
1 0 foo/bar
1 1 baz/foo
In this example I am using a simple match 'foo' in bar
yet you probably use regex for the job. 在这个例子中,我
'foo' in bar
使用了一个简单的匹配'foo' in bar
但你可能使用正则表达式来完成这项工作。
Obviously, enumerate()
can provide support in more than 2 levels as in your edited post. 显然,
enumerate()
可以在您编辑的帖子中提供超过2个级别的支持。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.