[英]Process a list of lists, finding all lists that have matching last values?
Given a list of lists给定一个列表列表
lol = [[0,a], [0,b],
[1,b], [1,c],
[2,d], [2,e],
[2,g], [2,b],
[3,e], [3,f]]
I would like to extract all sublists that have the same last element ( lol[n][1]
) and end up with something like below:我想提取具有相同最后一个元素(
lol[n][1]
)的所有子列表,并最终得到如下内容:
[0,b]
[1.b]
[2,b]
[2,e]
[3,e]
I know that given two lists we can use an intersection, what is the right way to go about a problem like this, other than incrementing the index value in a for each loop?我知道给定两个列表我们可以使用一个交集,除了在每个循环中增加索引值之外,go 的正确方法是什么?
You can use defaultdict
to the first group up your items with more than one occurrence, then, iterate over the dict.items
to get what you need.您可以使用
defaultdict
将您的项目进行多次分组,然后遍历dict.items
以获得您需要的内容。
from collections import defaultdict
lol = [[0,'a'], [0,'b'],
[1,'b'], [1,'c'],
[2,'d'], [2,'e'],
[2,'g'], [2,'b'],
[3,'e'], [3,'f']]
d = defaultdict(list)
for v,k in lol:
d[k].append(v)
# d looks like -
# defaultdict(list,
# {'a': [0],
# 'b': [0, 1, 2],
# 'c': [1],
# 'd': [2],
# 'e': [2, 3],
# 'g': [2],
# 'f': [3]})
result = [[v,k] for k,vs in d.items() for v in vs if len(vs)>1]
print(result)
[[0, 'b'], [1, 'b'], [2, 'b'], [2, 'e'], [3, 'e']]
Here is how you can do this with Pandas -这是使用 Pandas 的方法 -
import pandas as pd
df = pd.DataFrame(lol, columns=['val','key'])
dups = df[df['key'].duplicated(keep=False)]
result = list(dups.to_records(index=False))
print(result)
[(0, 'b'), (1, 'b'), (2, 'e'), (2, 'b'), (3, 'e')]
You can solve this in a vectorized manner using numpy -您可以使用 numpy 以矢量化方式解决此问题 -
arr
arr
u
and their counts c
u
及其计数c
dup
dup
出现多次的唯一元素列表arr
based on this booleanarr
import numpy as np
arr = np.array(lol)
u, c = np.unique(arr[:,1], return_counts=True)
dup = u[c > 1]
result = arr[(arr[:,1]==dup[:,None]).any(0)]
result
array([['0', 'b'],
['1', 'b'],
['2', 'e'],
['2', 'b'],
['3', 'e']], dtype='<U21')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.