简体   繁体   English

在元组中最长的项上过滤元组列表

[英]Filter a list of tuples on the longest item in the tuple

Say I have this data 说我有这个数据

my_list_of_tuples = [
    ('bill', [(4, ['626']), (4, ['253', '30', '626']),
              (4, ['253', '30', '626']), (4, ['626']),
              (4, ['626']), (4, ['626'])]),
    ('sarah', [(2, ['6']), (2, ['2', '6']), (2, ['2', '6']),
               (2, ['6']), (2, ['6']), (2, ['6'])]),
    ('fred', [(1, ['6']), (1, ['2']), (1, ['2'])])
]

And I want to keep out all the items that are longest in a sub-tuple list element, and duplicates are removed, so that I am left with 而且我想保留子元组列表元素中最长的所有项,并删除重复项,以便

my_output_list_of_tuples = [
    ('bill',  [(4, ['253', '30', '626'])]),
    ('sarah',  [(2, ['2', '6'])]),
    ('fred',  [(1, ['6']), (1, ['2'])])]

So far I tried 到目前为止,我尝试了

my_output_list_of_tuples = [(x[0], max(x[1], key=lambda tup: len(tup[1]))) for x in my_list_of_tuples] 

but that does not work for fred, because the max function only returns one item. 但这不适用于fred,因为max函数仅返回一项。 I also tried a few map attempts and lamba but got less far. 我也尝试了几次地图尝试和兰巴舞,但是走得更远。

I'm OK to break it up like 我可以把它分解

for my_list_of_tuples_by_person_name in my_list_of_tuples:
    #Do something with my_list_of_tuples_by_person_name[1]

Any ideas? 有任何想法吗?

Thanks in advance :) 提前致谢 :)

If you want to preserve duplicates like this, you can't just call max , you have to compare each value to the result of max . 如果要保留这样的重复项,则不能只调用max ,而必须将每个值与max的结果进行比较。

The most readable way to do this is probably to build a dict mapping keys to max lengths, and then compare each tuple against that: 最易读的方法可能是建立一个dict映射键到最大长度,然后将每个元组与此进行比较:

result = []
for name, sublist in my_list_of_tuples:
    d = {}
    for key, subsub in sublist:
        if len(subsub) > d.get(key, 0):
            d[key] = len(subsub)
    lst =[(key, subsub) for key, subsub in sublist if len(subsub) == d[key]]
    result.append((name, lst))

You can condense most parts of this down, but it'll probably only make things more opaque and less maintainable. 您可以压缩大部分内容,但这可能只会使事情变得更加不透明且难以维护。 And notice that the naive way to condense a two-pass loop into a single expression (where you calculate max each time through) converts it into a nested (quadratic) loop, so it's going to be even more verbose than you think. 并请注意,将两遍循环浓缩为单个表达式的天真方法(您每次都在其中计算max )将其转换为嵌套(二次)循环,因此它比您想象的还要冗长。


Since you've completely changed the problem and now apparently want only the longest sublist (presumably picking arbitrarily when there are duplicates, or non-duplicate-but-same-length values?), things are simpler: 由于您已经完全解决了问题,并且现在显然只想要最长的子列表(可能在存在重复项或长度不相同的情况下随意选择?),所以事情变得简单了:

result = []
for name, sublist in my_list_of_tuples:
    keysubsub = max(sublist, key=lambda keysubsub: len(keysubsub[1]))
    result.append((name, keysubsub))

But that's basically what you already had. 但这基本上就是您已经拥有的。 You say the problem with it is "… but that does not work for fred, because the max function only returns one item", but I'm not sure what you want instead of one item. 您说的问题是“……但不适用于fred,因为max函数仅返回一项”,但是我不确定要什么而不是一项。


If what you're looking for is all distinct lists of the maximum length, you can use a set or OrderedSet instead of a list in the first answer. 如果您要查找的是最大长度的所有不同列表,则可以使用setOrderedSet而不是第一个答案中的list There's no OrderedSet in the stdlib, but this recipe by Raymond Hettinger should be fine for our purposes. stdlib中没有OrderedSet ,但是Raymond Hettinger的这个食谱对于我们的目的应该是合适的。 But let's do it manually with a set and a list: 但是,让我们使用一个集合和一个列表手动进行操作:

result = []
for name, sublist in my_list_of_tuples:
    d = {}
    for key, subsub in sublist:
        if len(subsub) > d.get(key, 0):
            d[key] = len(subsub)
    lst, seen = [], set()
    for key, subsub in sublist:
        if len(subsub) == d[key] and tuple(subsub) not in seen:
            seen.add(tuple(subsub))
            lst.append((key, subsub))
    result.append((name, lst))

I think this last one provides exactly the output your updated question asks, and doesn't do anything hard to understand to get there. 我认为这最后一个准确地提供了您更新的问题所要求的输出,并且不难理解。

You can use max : 您可以使用max

my_list_of_tuples = my_list_of_tuples = [('bill', [(4, ['626']), (4, ['253', '30', '626']), (4, ['253', '30', '626']), (4, ['626']), (4, ['626']), (4, ['626'])]), ('sarah', [(2, ['6']), (2, ['2', '6']), (2, ['2', '6']), (2, ['6']), (2, ['6']), (2, ['6'])]), ('fred', [(1, ['6']), (1, ['2']), (1, ['2'])])]
final_result = [(a, [(c, d) for c, d in b if len(d) == max(map(len, [h for _, h in b]))]) for a, b in my_list_of_tuples]
new_result = [(a, [c for i, c in enumerate(b) if c not in b[:i]]) for a, b in final_result]

Output: 输出:

[('bill', [(4, ['253', '30', '626'])]), ('sarah', [(2, ['2', '6'])]), ('fred', [(1, ['6']), (1, ['2'])])]

First you define a function 首先定义一个函数

def f(ls):
    max_length = max(len(y) for (x, y) in ls)

    result = []

    for (x, y) in ls:
        if len(y) == max_length and (x, y) not in result:
            result.append((x, y))

    return result

Now call it like this 现在这样称呼它

>>> from pprint import pprint
>>> pprint([(name, f(y)) for name, y in my_list_of_tuples])
[('bill', [(4, ['253', '30', '626'])]),
 ('sarah', [(2, ['2', '6'])]),
 ('fred', [(1, ['6']), (1, ['2'])])]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM