使用itertools groupby从分类列表中获取索引值

Question

我有多个由nltk.Freqdist()生成的元组列表，如下所示：

totalist[0] = [('A',12),('C',1)] #index 0
totalist[1] = [('A',25),('X',3)] #index 1
totalist[2] = [('Z',3),('T',2)] #index 2
totalist[3] = [('Z',10),('M',8)] #index 3
totalist[4] = [('Z',8),('M',8)] #index 4
totalist[5] = [('C',10),('M',8)] #index 5

我想得到旧的索引值，即使在按groupby分组后：

到目前为止，这是我的代码，但是将无法正常工作，因为通过group by更改了索引，因此无法显示索引：

for key, group in groupby(totalist, lambda x: x[0][0]):
    for thing in group:
        #it should print it's old index value here 
    print(" ")

有什么办法解决这个问题吗？ 提前致谢。

Answer 1

假设已经排序列表

groupby假定该列表已经排序。 示例数据满足此假设。 您可以使用enumerate来保留原始索引并相应地修改键函数：

for key, group in groupby(enumerate(totalist), lambda x: x[1][0][0]):
    print(key)
    for temp_thing in group:
        old_index, thing = temp_thing
        print('    ', old_index, thing)

输出：

A
     0 [('A', 12), ('C', 1)]
     1 [('A', 25), ('X', 3)]
Z
     2 [('Z', 3), ('T', 2)]
     3 [('Z', 10), ('M', 8)]
     4 [('Z', 8), ('M', 8)]
C
     5 [('C', 10), ('M', 8)]

假设一个未排序的列表

如果您需要首先对列表进行排序，则这是一种经过修改的解决方案。 最好是编写一个将用于排序和分组的函数：

def key_function(x):
    return x[1][0][0]

现在，两次使用此功能以获得一致的排序和分组：

for key, group in groupby(sorted(enumerate(totalist), key=key_function), key_function):
    print(key)
    for temp_thing in group:
        old_index, thing = temp_thing
        print('    old index:', old_index)
        print('    thing:', thing)

输出：

A
    old index: 0
    thing: [('A', 12), ('C', 1)]
    old index: 1
    thing: [('A', 25), ('X', 3)]
C
    old index: 5
    thing: [('C', 10), ('M', 8)]
Z
    old index: 2
    thing: [('Z', 3), ('T', 2)]
    old index: 3
    thing: [('Z', 10), ('M', 8)]
    old index: 4
    thing: [('Z', 8), ('M', 8)]

使用itertools groupby从分类列表中获取索引值

问题描述

1 个解决方案

解决方案1
3 已采纳 2018-01-01 10:10:09

假设已经排序列表

假设一个未排序的列表

使用itertools groupby从分类列表中获取索引值

问题描述

1 个解决方案

解决方案1 3 已采纳 2018-01-01 10:10:09

假设已经排序列表

假设一个未排序的列表

解决方案1
3 已采纳 2018-01-01 10:10:09