简体   繁体   English

按嵌套字典列表中的值对字典排序

[英]Sort Dict by value in List of nested Dict

This has been driving me nuts for days, I've seen several questions on sorting Dicts by value, but they are simple structures and mine is complex. 这已经让我发疯了好几天,我已经看到了一些关于按值对Dict进行排序的问题,但是它们是简单的结构,而我的却很复杂。

My top level Dict Key is a hash, the sub Dict Key is a sequence #, and the sub Dict value is a list. 我的顶级Dict Key是哈希,子Dict Key是序列号,子Dict值是列表。 The last value in that is a number, which is what I want to sort the top level Dict by. 其中的最后一个值是一个数字,这就是我要按顶级Dict排序的内容。 The size of the Dict can be quite large, but here is a sample: Dict的大小可能非常大,但是这里有一个示例:

 {'16741b673a418af3812f6d43ea3f7daf': 
    {1: [0, '16741b673a418af3812f6d43ea3f7daf', 'data-01', 1132],
     2: [1, '16741b673a418af3812f6d43ea3f7daf', 'data-02', 1132],
     3: [2, '16741b673a418af3812f6d43ea3f7daf', 'data-03', 1132]},

 'cbef6de99cc2b9739c824db6d0246093':
    {4: [0, 'cbef6de99cc2b9739c824db6d0246093', 'data-04', 55296],
     5: [1, 'cbef6de99cc2b9739c824db6d0246093', 'data-05', 55296],
     6: [1, 'cbef6de99cc2b9739c824db6d0246093', 'data-06', 55296],
     7: [2, 'cbef6de99cc2b9739c824db6d0246093', 'data-07', 55296]},
 'a1e0f7ccdd8d38cb5ae00cdac71b6724':
    {8: [0, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-08', 20125],
     9: [1, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-09', 20125],
    10: [1, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-10', 20125]}}

This code will give me the value I'm looking for, but only for the first iteration, then I get KeyError: 1 这段代码将为我提供所需的值,但仅适用于第一次迭代,然后得到KeyError:1

for item1 in mydict.items():
    print(item1[1][1][3])

item1[1] returns subkey 1's list
item1[2] returns subkey 2's list
item1[3] returns subkey 3's list
item1[1][1][3] returns subkey 1's "value"

I want to be able to sort the dict forward and reverse by that value. 我希望能够根据该值对字典进行正向和反向排序。 I've seen: 我见过:

sorted(data.items(), key=lambda x:x[1])

I can't figure out how to apply that to my problem, generally my attempts end up with KeyError: 1 or IndexError: string out of range. 我不知道如何将其应用于我的问题,通常我的尝试最终会导致KeyError:1或IndexError:字符串超出范围。

What am I missing? 我想念什么? How can I reference that value for lamba? 我该如何参考兰巴舞的价值? Is that what I need to do? 那是我需要做的吗?

I'd prefer not to use a solution that includes Pandas. 我不希望使用包含熊猫的解决方案。 I'm trying to make this fast/efficient since the data can be quite large (currently 10,000 subkeys) 由于数据可能很大(目前有10,000个子键),因此我试图使其快速/高效

Edit: 编辑:

Output would look the same but sorted by last value in list: 输出看起来相同,但按列表中的最后一个值排序:

 {'16741b673a418af3812f6d43ea3f7daf': 
    {1: [0, '16741b673a418af3812f6d43ea3f7daf', 'data-01', 1132],
     2: [1, '16741b673a418af3812f6d43ea3f7daf', 'data-02', 1132],
     3: [2, '16741b673a418af3812f6d43ea3f7daf', 'data-03', 1132]},

 'a1e0f7ccdd8d38cb5ae00cdac71b6724':
    {8: [0, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-08', 20125],
     9: [1, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-09', 20125],
    10: [1, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-10', 20125]},

 'cbef6de99cc2b9739c824db6d0246093':
    {4: [0, 'cbef6de99cc2b9739c824db6d0246093', 'data-04', 55296],
     5: [1, 'cbef6de99cc2b9739c824db6d0246093', 'data-05', 55296],
     6: [1, 'cbef6de99cc2b9739c824db6d0246093', 'data-06', 55296],
     7: [2, 'cbef6de99cc2b9739c824db6d0246093', 'data-07', 55296]}}

Your question is a bit unclear, what I understand is that you have {k1: {k2: [v1, v2, v3, v4]}} , you want to sort every top-level entry by v4 which should be the same in every list (so it doesn't matter which we pick). 您的问题有点不清楚,据我了解,您有{k1: {k2: [v1, v2, v3, v4]}} ,您希望按v4对每个顶级条目进行排序,每个顶级条目应相同列表(因此我们选择哪个都没关系)。 However the the sub-entries ( k2 ) are not constant between the top-level entries. 但是,子条目( k2 )在顶级条目之间不是恒定的。

Getting v4 from a sub-entry is easy ( [3] or [-1] ) the issue is getting an arbitrary value of the second-level dict. 从子条目获取v4很容易( [3][-1] ),问题是获取第二级dict的任意值。 next(iter(d.values())) ought do: iterate the sub-values (the lists), and get the first value out of the iterator. next(iter(d.values()))应该做:迭代子值(列表),并从迭代器中获取第一个值。 Not that this will raise an error if a sub-entry is empty (a top-level key maps to an empty dict). 如果子条目为空(顶层键映射为空字典),这并不是一个错误。

So sorted(data.items(), key=lambda e: next(iter(e[1].values()))[-1]) should work: 因此sorted(data.items(), key=lambda e: next(iter(e[1].values()))[-1])应该可以工作:

[('16741b673a418af3812f6d43ea3f7daf',
  {1: [0, '16741b673a418af3812f6d43ea3f7daf', 'data-01', 1132],
   2: [1, '16741b673a418af3812f6d43ea3f7daf', 'data-02', 1132],
   3: [2, '16741b673a418af3812f6d43ea3f7daf', 'data-03', 1132]}),
 ('a1e0f7ccdd8d38cb5ae00cdac71b6724',
  {8: [0, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-08', 20125],
   9: [1, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-09', 20125],
   10: [1, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-10', 20125]}),
 ('cbef6de99cc2b9739c824db6d0246093',
  {4: [0, 'cbef6de99cc2b9739c824db6d0246093', 'data-04', 55296],
   5: [1, 'cbef6de99cc2b9739c824db6d0246093', 'data-05', 55296],
   6: [1, 'cbef6de99cc2b9739c824db6d0246093', 'data-06', 55296],
   7: [2, 'cbef6de99cc2b9739c824db6d0246093', 'data-07', 55296]})]

Be aware that this will return a list of (key, value) tuples, not a dictionary. 请注意,这将返回(key, value)元组的列表,而不是字典。 You'll have to feed it back to dict (ideally OrderedDict, possibly the regular dict in Python 3.6 or more recent) to keep the order: 您必须将其反馈给dict (最好是OrderedDict,可能是Python 3.6或更高版本中的常规dict )以保持顺序:

{'16741b673a418af3812f6d43ea3f7daf': 
   {1: [0, '16741b673a418af3812f6d43ea3f7daf', 'data-01', 1132],
    2: [1, '16741b673a418af3812f6d43ea3f7daf', 'data-02', 1132],
    3: [2, '16741b673a418af3812f6d43ea3f7daf', 'data-03', 1132]},
 'a1e0f7ccdd8d38cb5ae00cdac71b6724': 
   {8: [0, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-08', 20125],
    9: [1, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-09', 20125],
    10: [1, 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'data-10', 20125]},
 'cbef6de99cc2b9739c824db6d0246093': {
    4: [0, 'cbef6de99cc2b9739c824db6d0246093', 'data-04', 55296],
    5: [1, 'cbef6de99cc2b9739c824db6d0246093', 'data-05', 55296],
    6: [1, 'cbef6de99cc2b9739c824db6d0246093', 'data-06', 55296],
    7: [2, 'cbef6de99cc2b9739c824db6d0246093', 'data-07', 55296]}}

Here's an ugly (and pretty inefficient) variant. 这是一个丑陋(而且效率很低)的变体。 It combines a dict comprehension, sorting, and getting the dict value corresponding to the 1 st of the keys (by (the ugly) d[list(d.keys())[0]] ): 它结合一个字典理解,排序,并获得对应于该键的1 字典值(由(丑) d[list(d.keys())[0]]

 >>> data.keys() dict_keys(['16741b673a418af3812f6d43ea3f7daf', 'cbef6de99cc2b9739c824db6d0246093', 'a1e0f7ccdd8d38cb5ae00cdac71b6724']) >>> data_sorted = {k: v for k, v in sorted(data.items(), key=lambda x: x[1][list(x[1].keys())[0]][3])} >>> data_sorted.keys() dict_keys(['16741b673a418af3812f6d43ea3f7daf', 'a1e0f7ccdd8d38cb5ae00cdac71b6724', 'cbef6de99cc2b9739c824db6d0246093']) 

You've got KeyError in item1[1][1][3] because 1 ( the 2 nd one) only exists in the sub-dictionary of '16741b673a418af3812f6d43ea3f7daf' . 您在item1[1][1][3]遇到了KeyError ,因为1第二个)仅存在于'16741b673a418af3812f6d43ea3f7daf'的子词典中。

Your dict contains an awful amount of duplication. 您的字典包含大量重复项。 It could be trimmed down to something like (also making the sorting expression much simpler): 可以将其修剪成类似以下内容(也使排序表达式更简单):

 >>> data = {("16741b673a418af3812f6d43ea3f7daf", 1132): ["data-01", "data-02", "data-03"], ... ("cbef6de99cc2b9739c824db6d0246093", 55296): ["data-04", "data-05", "data-06", "data-07"], ... ("a1e0f7ccdd8d38cb5ae00cdac71b6724", 20125): ["data-08", "data-09", "data-10"]} >>> >>> {k: v for k, v in sorted(data.items(), key=lambda x: x[0][1])} 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM