[英]How to get unique values in nested list along single column?
I need to extract only unique sublists based on first element from a nested list.我只需要从嵌套列表中提取基于第一个元素的唯一子列表。 For eg
例如
in = [['a','b'], ['a','d'], ['e','f'], ['g','h'], ['e','i']]
out = [['a','b'], ['e','f'], ['g','h']]
My method is two break list into two lists and check for elements individually.我的方法是将两个列表分成两个列表并分别检查元素。
lis = [['a','b'], ['a','d'], ['e','f'], ['g','h']]
lisa = []
lisb = []
for i in lis:
if i[0] not in lisa:
lisa.append(i[0])
lisb.append(i[1])
out = []
for i in range(len(lisa)):
temp = [lisa[i],lisb[i]]
out.append(temp)
This is an expensive operation when dealing with list with 10,00,000+ sublists.在处理包含 10,00,000 多个子列表的列表时,这是一项昂贵的操作。 Is there a better method?
有更好的方法吗?
Use memory-efficient generator function with an auziliary set
object to filter items on the first unique subelement (take first unique):使用具有辅助
set
object 的内存高效生成器 function 来过滤第一个唯一子元素上的项目(取第一个唯一):
def gen_take_first(s):
seen = set()
for sub_l in s:
if sub_l[0] not in seen:
seen.add(sub_l[0])
yield sub_l
inp = [['a','b'], ['a','d'], ['e','f'], ['g','h'], ['e','i']]
out = list(gen_take_first(inp))
print(out)
[['a', 'b'], ['e', 'f'], ['g', 'h']]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.