处理列表列表，查找所有匹配最后一个值的列表？

Question

Given a list of lists给定一个列表列表

lol = [[0,a], [0,b],
       [1,b], [1,c],
       [2,d], [2,e],
       [2,g], [2,b],
       [3,e], [3,f]]

I would like to extract all sublists that have the same last element ( lol[n][1] ) and end up with something like below:我想提取具有相同最后一个元素（ lol[n][1] ）的所有子列表，并最终得到如下内容：

[0,b]
[1.b]
[2,b]
[2,e]
[3,e]

I know that given two lists we can use an intersection, what is the right way to go about a problem like this, other than incrementing the index value in a for each loop?我知道给定两个列表我们可以使用一个交集，除了在每个循环中增加索引值之外，go 的正确方法是什么？

Answer 1

1. Using collections.defaultdict 1.使用collections.defaultdict

You can use defaultdict to the first group up your items with more than one occurrence, then, iterate over the dict.items to get what you need.您可以使用defaultdict将您的项目进行多次分组，然后遍历dict.items以获得您需要的内容。

from collections import defaultdict


lol = [[0,'a'], [0,'b'],
       [1,'b'], [1,'c'],
       [2,'d'], [2,'e'],
       [2,'g'], [2,'b'],
       [3,'e'], [3,'f']]


d = defaultdict(list)

for v,k in lol:
    d[k].append(v)

# d looks like - 
# defaultdict(list,
#             {'a': [0],
#              'b': [0, 1, 2],
#              'c': [1],
#              'd': [2],
#              'e': [2, 3],
#              'g': [2],
#              'f': [3]})
    
result = [[v,k] for k,vs in d.items() for v in vs if len(vs)>1]
print(result)

[[0, 'b'], [1, 'b'], [2, 'b'], [2, 'e'], [3, 'e']]

2. Using pandas.duplicated 2.使用pandas.duplicated

Here is how you can do this with Pandas -这是使用 Pandas 的方法 -

Convert to pandas dataframe转换为 pandas dataframe
For key column, find the duplicates and keep all of them对于关键列，找到重复项并保留所有项
Convert to list of records while ignoring index在忽略索引的同时转换为记录列表

import pandas as pd

df = pd.DataFrame(lol, columns=['val','key'])
dups = df[df['key'].duplicated(keep=False)]
result = list(dups.to_records(index=False))
print(result)

[(0, 'b'), (1, 'b'), (2, 'e'), (2, 'b'), (3, 'e')]

3. Using numpy.unique 3.使用numpy.unique

You can solve this in a vectorized manner using numpy -您可以使用 numpy 以矢量化方式解决此问题 -

Convert to numpy matrix arr转换为 numpy 矩阵arr
Find unique elements u and their counts c查找唯一元素u及其计数c
Filter list of unique elements that occur more than once dup过滤dup出现多次的唯一元素列表
Use broadcasting to compare the second column of the array and take any over axis=0 to get a boolean which is True for duplicated rows使用广播比较数组的第二列并取任何 overaxis=0 以获得 boolean ，对于重复的行为 True
Filter the arr based on this boolean根据这个boolean过滤arr

import numpy as np

arr = np.array(lol)

u, c = np.unique(arr[:,1], return_counts=True)
dup = u[c > 1]

result = arr[(arr[:,1]==dup[:,None]).any(0)]
result

array([['0', 'b'],
       ['1', 'b'],
       ['2', 'e'],
       ['2', 'b'],
       ['3', 'e']], dtype='<U21')

处理列表列表，查找所有匹配最后一个值的列表？

问题描述

1 个解决方案

解决方案1
1 2021-11-24 01:03:13

1. Using collections.defaultdict 1.使用collections.defaultdict

2. Using pandas.duplicated 2.使用pandas.duplicated

3. Using numpy.unique 3.使用numpy.unique

处理列表列表，查找所有匹配最后一个值的列表？

问题描述

1 个解决方案

解决方案1 1 2021-11-24 01:03:13

1. Using collections.defaultdict 1.使用collections.defaultdict

2. Using pandas.duplicated 2.使用pandas.duplicated

3. Using numpy.unique 3.使用numpy.unique

解决方案1
1 2021-11-24 01:03:13