简体   繁体   English

获取与 itertools.groupby 中的值匹配的键数

[英]Get number of keys matching a value in itertools.groupby

I have lists of binary values and I am trying to get the number of groups of consecutive 1s in each list.我有二进制值列表,我正在尝试获取每个列表中连续 1 的组数。

Here's a few examples:下面是几个例子:

[0, 0, 0, 0, 0, 0, 0, 0] -> 0
[1, 1, 1, 1, 1, 1, 1, 1] -> 1
[0, 1, 1, 1, 1, 0, 0, 0] -> 1
[0, 1, 1, 1, 0, 0, 1, 0] -> 2

I use itertools.groupby() to split the lists into groups, which gets me an iterator with the keys and groups, but I can't quite figure out how to get the number of groups of 1s specifically.我使用 itertools.groupby() 将列表分成组,这让我得到了一个带有键和组的迭代器,但我不太清楚如何具体获取 1 组的数量。

Obviously, I could iterate over the keys and count up with an if statement but I'm sure there's a better way.显然,我可以遍历键并使用 if 语句进行计数,但我确信有更好的方法。

While writing the question, I found the following solution (which was obvious in retrospect).在写问题时,我找到了以下解决方案(回想起来很明显)。

run_count = sum(k == 1 for k, g in itertools.groupby(labels_sample))

I'm not sure if it's the best, but it works.我不确定它是否是最好的,但它确实有效。

In this specific case, having the keys 0 and 1 , you can omit the k == 1 check and include the zeros in the sum.在这种特定情况下,使用键01 ,您可以省略k == 1检查并在总和中包含零。

sum(k for k, _ in groupby([0, 1, 1, 1, 0, 0, 1, 0])) -> 2 sum(k for k, _ in groupby([0, 1, 1, 1, 0, 0, 1, 0])) -> 2

Not with groupby , but to possibly answer the "a better way", this appears to be faster:不是groupby ,而是可能回答“更好的方法”,这似乎更快:

def count_groups_of_ones(lst):
    it = iter(lst)
    count = 0
    while 1 in it:
        count += 1
        0 in it
    return count

Benchmark results for your four small lists:四个小列表的基准测试结果:

  3.72 ms  with_groupby
  1.76 ms  with_in_iterator

And with longer lists (your lists multiplied by 1000):使用更长的列表(您的列表乘以 1000):

984.32 ms  with_groupby
669.11 ms  with_in_iterator

Benchmark code ( Try it online! ):基准代码( 在线试用! ):

def with_groupby(lst):
    return sum(k for k, _ in groupby(lst))

def with_in_iterator(lst):
    it = iter(lst)
    count = 0
    while 1 in it:
        count += 1
        0 in it
    return count

from timeit import repeat
from itertools import groupby
from collections import deque
from operator import itemgetter, countOf

funcs = [
    with_groupby,
    with_in_iterator,
]

def benchmark(lists, number):
    print('lengths:', *map(len, lists))
    for _ in range(3):
        for func in funcs:
            t = min(repeat(lambda: deque(map(func, lists), 0), number=number)) / number
            print('%6.2f ms ' % (t * 1e6), func.__name__)
        print()    

lists = [
    [0, 0, 0, 0, 0, 0, 0, 0],
    [1, 1, 1, 1, 1, 1, 1, 1],
    [0, 1, 1, 1, 1, 0, 0, 0],
    [0, 1, 1, 1, 0, 0, 1, 0],
]

for func in funcs:
    print(*map(func, lists))
benchmark(lists, 10000)
benchmark([lst * 1000 for lst in lists], 40)

Another option that is more general:另一个更通用的选项:

def count_groups(lst, value):
    start = object()
    return sum((a is start or a != value) and b == value for a, b in zip([start] + lst, lst))

count_groups([0, 1, 1, 1, 0, 0, 1, 0], 1) # 2

If optimizing for speed over a long list, try adapting this answer that uses numpy :如果针对长列表的速度进行优化,请尝试调整使用numpy的答案

def count_groups(lst, value):
    return np.diff(np.array(lst) == value, prepend=False, append=False).sum() // 2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM