简体   繁体   English

在Python列表中查找连续重复的字符串

[英]Finding consecutively repeating strings in Python list

What is the most efficient way to find consecutively repeating strings in a Python list? 在Python列表中查找连续重复的字符串的最有效方法是什么?

For example, suppose I have the list ["a", "a", "b", "c", "b","b","b"] . 例如,假设我有列表["a", "a", "b", "c", "b","b","b"] I want an output of something like: ["group of 2 a's found at index 0, group of 3 b's found at index 4'] . 我想要类似以下内容的输出: ["group of 2 a's found at index 0, group of 3 b's found at index 4']

Is there a built in function to accomplish this task? 是否有内置功能来完成此任务? I did find numpy.bincount , but that seems to only work on numeric values. 我确实找到了numpy.bincount ,但这似乎仅适用于数值。

Thanks in advance for the help. 先谢谢您的帮助。

It's funny that you should call it a group, because the function probably best-suited to this is itertools.groupby : 有趣的是,您应该将其称为组,因为可能最适合此功能的函数是itertools.groupby

>>> import itertools
>>> items = ["a", "a", "b", "c", "b", "b", "b"]
>>> [(k, sum(1 for _ in vs)) for k, vs in itertools.groupby(items)]
[('a', 2), ('b', 1), ('c', 1), ('b', 3)]

( sum(1 for _ in vs) is a count, by the way, since len doesn't work on just any iterable, and len(list(…)) is wasteful.) (顺便说一句, sum(1 for _ in vs)是一个计数,因为len不能在任何可迭代的对象上工作,而len(list(…))却很浪费。)

Getting the index is a little more complicated; 获取索引要稍微复杂一些。 I'd just do it using a loop. 我只是使用循环来做。

import itertools

def group_with_index(l):
    i = 0

    for k, vs in itertools.groupby(l):
        c = sum(1 for _ in vs)
        yield (k, c, i)
        i += c

This requires state information between elements of the loop so its not easy to do with a list comprehension. 这需要循环元素之间的状态信息,因此使用列表理解并不容易。 Instead you can keep track of last value in a loop: 相反,您可以在循环中跟踪最后一个值:

groups = []
for i, val in enumerate(["a", "a", "b", "c", "b","b","b"]):
    if i == 0:
         cnt = 1
         loc = i
         last_val = val
    elif val == last_val:
         cnt += 1
    else:
         groups.append((cnt, last_val, loc))
         cnt = 1
         loc = i
         last_val = val

for group in groups:
     print("group of {0} {1}'s found at index {2}".format(*group)

Output: 输出:

group of 2 a's found at index 0
group of 1 b's found at index 2
group of 1 c's found at index 3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM