How to get the index and occurance of each item using itertools.groupby()

Question

Here's the story I have two lists:

list_one=[1,2,9,9,9,3,4,9,9,9,9,2]
list_two=["A","B","C","D","A","E","F","G","H","Word1","Word2"]

I want to find the indicies of consecutive 9's in list_one so that I can get corresponding string from list_two, I've tried:

group_list_one= [(k, sum(1 for i in g),pdn.index(k)) for k,g in groupby(list_one)]

I was hoping to get the index of the first 9 in each tuple and then try to go from there, but that did not work..

What can I do here?? PS: I've looked at the documentation of itertools but it seems very vague to me.. Thanks in advance

EDIT: Expected output is (key,occurances,index_of_first_occurance) something like

[(9, 3, 2), (9, 4, 7)]

Answer 1

Judging by your expected output, give this a try:

from itertools import groupby

list_one=[1,2,9,9,9,3,4,9,9,9,9,2]
list_two=["A","B","C","D","A","E","F","G","H","Word1","Word2"]
data = zip(list_one, list_two)
i = 0
out = []

for key, group in groupby(data, lambda x: x[0]):
        number, word = next(group)
        elems = len(list(group)) + 1
        if number == 9 and elems > 1:
            out.append((key, elems, i))
        i += elems

print out

Output:

[(9, 3, 2), (9, 4, 7)]

But if you really wanted an output like this:

[(9, 3, 'C'), (9, 4, 'G')]

then look at this snippet:

from itertools import groupby

list_one=[1,2,9,9,9,3,4,9,9,9,9,2]
list_two=["A","B","C","D","A","E","F","G","H","Word1","Word2"]
data = zip(list_one, list_two)
out = []

for key, group in groupby(data, lambda x: x[0]):
    number, word = next(group)
    elems = len(list(group)) + 1
    if number == 9 and elems > 1:
        out.append((key, elems, word))

print out

Answer 2

Okay, I have oneliner solution. It is ugly, but bear with me.

Let's consider the problem. We have a list that we want to sum up using itertools.groupby. groupby gives us a list of keys and iteration of their repetition. In this stage we can't calculate the index, but we can easily find the number of occurances.

[(key, len(list(it))) for (key, it) in itertools.groupby(list_one)]

Now, the real problem is that we want to calculate the indexes in relation to older data. In most oneliner common functions, we are only examining the current state. However, there is one function that let us take a glimpse at the past - reduce .

What reduce does, is to go over the iterator and execute a function with the last result of the function and the new item. For example reduce(lambda x,y: x*y, [2,3,4]) will calculate 2*3 = 6, and then 6*4=24 and return 24. In addition, you can choose another initial for x instead of the first item.

Let's use it here - for each item, the index will be the last index + the last number of occurences. In order to have a valid list, we'll use [(0,0,0)] as the initial value. (We get rid of it in the end).

reduce(lambda lst,item: lst + [(item[0], item[1], lst[-1][1] + lst[-1][-1])], 
       [(key, len(list(it))) for (key, it) in itertools.groupby(list_one)], 
       [(0,0,0)])[1:]

If we don't won't to add initial value, we can sum the numbers of occurrences that appeared so far.

reduce(lambda lst,item: lst + [(item[0], item[1], sum(map(lambda i: i[1], lst)))],
       [(key, len(list(it))) for (key, it) in itertools.groupby(list_one)], [])

Of course it gives us all the numbers. If we want only the 9's, we can wrap the whole thing in filter :

filter(lambda item: item[0] == 9, ... )

Answer 3

Well, this may not be the most elegant solution, but here goes:

g = groupby(enumerate(list_one), lambda x:x[1])
l = [(x[0], list(x[1])) for x in g if x[0] == 9]
[(x[0], len(x[1]), x[1][0][0]) for x in l]

which gives

[(9, 3, 2), (9, 4, 7)]

Answer 4

This looks like a problem that would be too complicated to stick into a list comprehension.

element_index = 0 #the index in list_one of the first element in a group
for element, occurrences in itertools.groupby(list_one):
    count = sum(1 for i in occurrences)
    yield (element, count, element_index)
    element_index += count

If you wanted to eliminate the element_index variable, think about what a cumulative_sum function would need to do, where it's value is dependent on all previous values that have been iterated.

How to get the index and occurance of each item using itertools.groupby()

Question

4 answers

solution1
5 ACCPTED 2014-04-11 23:40:09

solution2
3 2014-04-12 00:15:56

solution3
2 2014-04-11 23:03:54

solution4
1 2014-04-11 23:38:18

How to get the index and occurance of each item using itertools.groupby()

Question

4 answers

solution1 5 ACCPTED 2014-04-11 23:40:09

solution2 3 2014-04-12 00:15:56

solution3 2 2014-04-11 23:03:54

solution4 1 2014-04-11 23:38:18

solution1
5 ACCPTED 2014-04-11 23:40:09

solution2
3 2014-04-12 00:15:56

solution3
2 2014-04-11 23:03:54

solution4
1 2014-04-11 23:38:18