简体   繁体   English

处理列表列表,查找所有匹配最后一个值的列表?

[英]Process a list of lists, finding all lists that have matching last values?

Given a list of lists给定一个列表列表

lol = [[0,a], [0,b],
       [1,b], [1,c],
       [2,d], [2,e],
       [2,g], [2,b],
       [3,e], [3,f]]

I would like to extract all sublists that have the same last element ( lol[n][1] ) and end up with something like below:我想提取具有相同最后一个元素( lol[n][1] )的所有子列表,并最终得到如下内容:

[0,b]
[1.b]
[2,b]
[2,e]
[3,e]

I know that given two lists we can use an intersection, what is the right way to go about a problem like this, other than incrementing the index value in a for each loop?我知道给定两个列表我们可以使用一个交集,除了在每个循环中增加索引值之外,go 的正确方法是什么?

1. Using collections.defaultdict 1.使用collections.defaultdict

You can use defaultdict to the first group up your items with more than one occurrence, then, iterate over the dict.items to get what you need.您可以使用defaultdict将您的项目进行多次分组,然后遍历dict.items以获得您需要的内容。

from collections import defaultdict


lol = [[0,'a'], [0,'b'],
       [1,'b'], [1,'c'],
       [2,'d'], [2,'e'],
       [2,'g'], [2,'b'],
       [3,'e'], [3,'f']]


d = defaultdict(list)

for v,k in lol:
    d[k].append(v)

# d looks like - 
# defaultdict(list,
#             {'a': [0],
#              'b': [0, 1, 2],
#              'c': [1],
#              'd': [2],
#              'e': [2, 3],
#              'g': [2],
#              'f': [3]})
    
result = [[v,k] for k,vs in d.items() for v in vs if len(vs)>1]
print(result)
[[0, 'b'], [1, 'b'], [2, 'b'], [2, 'e'], [3, 'e']]

2. Using pandas.duplicated 2.使用pandas.duplicated

Here is how you can do this with Pandas -这是使用 Pandas 的方法 -

  1. Convert to pandas dataframe转换为 pandas dataframe
  2. For key column, find the duplicates and keep all of them对于关键列,找到重复项并保留所有项
  3. Convert to list of records while ignoring index在忽略索引的同时转换为记录列表
import pandas as pd

df = pd.DataFrame(lol, columns=['val','key'])
dups = df[df['key'].duplicated(keep=False)]
result = list(dups.to_records(index=False))
print(result)
[(0, 'b'), (1, 'b'), (2, 'e'), (2, 'b'), (3, 'e')]

3. Using numpy.unique 3.使用numpy.unique

You can solve this in a vectorized manner using numpy -您可以使用 numpy 以矢量化方式解决此问题 -

  1. Convert to numpy matrix arr转换为 numpy 矩阵arr
  2. Find unique elements u and their counts c查找唯一元素u及其计数c
  3. Filter list of unique elements that occur more than once dup过滤dup出现多次的唯一元素列表
  4. Use broadcasting to compare the second column of the array and take any over axis=0 to get a boolean which is True for duplicated rows使用广播比较数组的第二列并取任何 overaxis=0 以获得 boolean ,对于重复的行为 True
  5. Filter the arr based on this boolean根据这个boolean过滤arr
import numpy as np

arr = np.array(lol)

u, c = np.unique(arr[:,1], return_counts=True)
dup = u[c > 1]

result = arr[(arr[:,1]==dup[:,None]).any(0)]
result
array([['0', 'b'],
       ['1', 'b'],
       ['2', 'e'],
       ['2', 'b'],
       ['3', 'e']], dtype='<U21')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM