简体   繁体   English

如何使用itertools.groupby()获取每个项目的索引和出现位置

[英]How to get the index and occurance of each item using itertools.groupby()

Here's the story I have two lists: 这是我有两个清单的故事:

list_one=[1,2,9,9,9,3,4,9,9,9,9,2]
list_two=["A","B","C","D","A","E","F","G","H","Word1","Word2"]

I want to find the indicies of consecutive 9's in list_one so that I can get corresponding string from list_two, I've tried: 我想在list_one中找到连续的9的索引,以便可以从list_two获取相应的字符串,我已经尝试过:

group_list_one= [(k, sum(1 for i in g),pdn.index(k)) for k,g in groupby(list_one)]

I was hoping to get the index of the first 9 in each tuple and then try to go from there, but that did not work.. 我希望获得每个元组中前9个的索引,然后尝试从那里开始,但是那没有用。

What can I do here?? 我在这里可以做什么? PS: I've looked at the documentation of itertools but it seems very vague to me.. Thanks in advance PS:我看过itertools的文档,但对我来说似乎很模糊。

EDIT: Expected output is (key,occurances,index_of_first_occurance) something like 编辑:预期的输出是(键,次数,index_of_first_occurance)像

[(9, 3, 2), (9, 4, 7)]

Judging by your expected output, give this a try: 从您的预期输出来看,尝试一下:

from itertools import groupby

list_one=[1,2,9,9,9,3,4,9,9,9,9,2]
list_two=["A","B","C","D","A","E","F","G","H","Word1","Word2"]
data = zip(list_one, list_two)
i = 0
out = []

for key, group in groupby(data, lambda x: x[0]):
        number, word = next(group)
        elems = len(list(group)) + 1
        if number == 9 and elems > 1:
            out.append((key, elems, i))
        i += elems

print out

Output: 输出:

[(9, 3, 2), (9, 4, 7)]

But if you really wanted an output like this: 但是,如果您真的想要这样的输出:

[(9, 3, 'C'), (9, 4, 'G')]

then look at this snippet: 然后看一下这段代码:

from itertools import groupby

list_one=[1,2,9,9,9,3,4,9,9,9,9,2]
list_two=["A","B","C","D","A","E","F","G","H","Word1","Word2"]
data = zip(list_one, list_two)
out = []

for key, group in groupby(data, lambda x: x[0]):
    number, word = next(group)
    elems = len(list(group)) + 1
    if number == 9 and elems > 1:
        out.append((key, elems, word))

print out

Okay, I have oneliner solution. 好的,我有oneliner解决方案。 It is ugly, but bear with me. 这很丑,但请忍受我。

Let's consider the problem. 让我们考虑这个问题。 We have a list that we want to sum up using itertools.groupby. 我们有一个列表要使用itertools.groupby进行汇总。 groupby gives us a list of keys and iteration of their repetition. groupby为我们提供了键列表及其重复的迭代。 In this stage we can't calculate the index, but we can easily find the number of occurances. 在此阶段,我们无法计算索引,但可以轻松找到发生次数。

[(key, len(list(it))) for (key, it) in itertools.groupby(list_one)]

Now, the real problem is that we want to calculate the indexes in relation to older data. 现在,真正的问题是我们要计算与旧数据相关的索引。 In most oneliner common functions, we are only examining the current state. 在大多数单行通用功能中,我们仅检查当前状态。 However, there is one function that let us take a glimpse at the past - reduce . 但是,有一个功能让我们瞥一眼过去reduce

What reduce does, is to go over the iterator and execute a function with the last result of the function and the new item. reduce作用是遍历迭代器并执行具有该函数和新项的最后结果的函数。 For example reduce(lambda x,y: x*y, [2,3,4]) will calculate 2*3 = 6, and then 6*4=24 and return 24. In addition, you can choose another initial for x instead of the first item. 例如, reduce(lambda x,y: x*y, [2,3,4])将计算2 * 3 = 6,然后6 * 4 = 24并返回24。此外,您可以为x选择另一个首字母而不是第一项。

Let's use it here - for each item, the index will be the last index + the last number of occurences. 让我们在这里使用它-对于每个项目,索引将是最后一个索引+最后出现的次数。 In order to have a valid list, we'll use [(0,0,0)] as the initial value. 为了获得有效的列表,我们将使用[(0,0,0)]作为初始值。 (We get rid of it in the end). (我们最终摆脱了它)。

reduce(lambda lst,item: lst + [(item[0], item[1], lst[-1][1] + lst[-1][-1])], 
       [(key, len(list(it))) for (key, it) in itertools.groupby(list_one)], 
       [(0,0,0)])[1:]

If we don't won't to add initial value, we can sum the numbers of occurrences that appeared so far. 如果我们不添加初始值,则可以将到目前为止出现的次数相加。

reduce(lambda lst,item: lst + [(item[0], item[1], sum(map(lambda i: i[1], lst)))],
       [(key, len(list(it))) for (key, it) in itertools.groupby(list_one)], [])

Of course it gives us all the numbers. 当然,它给了我们所有的数字。 If we want only the 9's, we can wrap the whole thing in filter : 如果只需要9,则可以将整个内容包装在filter

filter(lambda item: item[0] == 9, ... )

Well, this may not be the most elegant solution, but here goes: 好吧,这可能不是最优雅的解决方案,但是这里有:

g = groupby(enumerate(list_one), lambda x:x[1])
l = [(x[0], list(x[1])) for x in g if x[0] == 9]
[(x[0], len(x[1]), x[1][0][0]) for x in l]

which gives 这使

[(9, 3, 2), (9, 4, 7)]

This looks like a problem that would be too complicated to stick into a list comprehension. 这看起来像一个问题,太复杂了,无法坚持到列表理解中。

element_index = 0 #the index in list_one of the first element in a group
for element, occurrences in itertools.groupby(list_one):
    count = sum(1 for i in occurrences)
    yield (element, count, element_index)
    element_index += count

If you wanted to eliminate the element_index variable, think about what a cumulative_sum function would need to do, where it's value is dependent on all previous values that have been iterated. 如果您想消除element_index变量,请考虑一个cumulative_sum函数需要做什么,它的值取决于所有已迭代的先前值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM