简体   繁体   English

对字典中列表的值进行排序

[英]Sort values of a list in a dictionary

I have over 9000 data attributes in a dictionary. 我在字典中有9000多个数据属性。 Simple version looks like this: 简单版本如下所示:

test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]}

They are gene id's with two values inside a list. 它们是基因ID,列表中有两个值。 What I want is to end up with multiple dictionaries that contain all the id's with values above or under a certain value. 我要结束的是多个字典,这些字典包含具有大于或小于某个特定值的所有id。 So in this example lets say I want to end up with three dictionaries, one contains the id's and values that are both below 85, the other both above 85 and the last with the first value under 85 and the second above 85. So I would end up with this: 因此,在这个示例中,我想结束三个字典,一个字典包含的id和值都低于85,另一个字典包含的值都高于85,最后一个字典的第一个值低于85,第二个值高于85。最后这样:

testabove = { 892456: [88, 88]}  

and

testbelow = { 524292: [80, 80]}

and

testboth = { {1092268: [81, 90]} 

I have no idea how to figure this out. 我不知道该如何解决。

It is simple to do that using a dictionary comprehension 使用字典理解很容易做到这一点

>>> testabove = {i:j for i,j in test.items() if j[0]>85 and j[1] > 85}
>>> testbelow = {i:j for i,j in test.items() if j[0]<85 and j[1] < 85}
>>> testboth = {i:j for i,j in test.items() if i not in testabove and i not in testbelow}
>>> testabove
{892456: [88, 88]}
>>> testbelow
{524292: [80, 80]}
>>> testboth
{1092268: [81, 90]}

As Marein mentions below in comments , the other way to do it 正如Marein在下面的评论中提到的,另一种方式

>>> test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]}
>>> testabove = {i:j for i,j in test.items() if all(x>85 for x in j)}
>>> testbelow = {i:j for i,j in test.items() if all(x<85 for x in j)}
>>> testabove
{892456: [88, 88]}
>>> testbelow
{524292: [80, 80]}

This uses all function 这使用了all功能

Comparison 比较方式

$ python -m timeit "test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]};testabove = {i:j for i,j in test.items() if all(x>85 for x in j)}"
100000 loops, best of 3: 2.29 usec per loop
$ python -m timeit "test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]};testabove = {i:j for i,j in test.items() if j[0]>85 and j[1] > 85}"
1000000 loops, best of 3: 0.99 usec per loop

As you can see, the straight forward way is faster than using all . 如您所见,直接方法比使用all更快。

Here's another solution that should be faster for a large amount of data, because it builds all three dictionaries in one iteration. 这是另一种对于大量数据应该更快的解决方案,因为它可以在一次迭代中构建所有三个字典。

def compare_both(xs,pivot):
 if xs[0] < pivot and xs[1] < pivot: return -1
 if xs[0] > pivot and xs[1] > pivot: return 1
 return 0

def sort_dict(d,pivot):
  dicts = [{},{},{}]
  for key,value in d.items():
    dicts[compare_both(value,pivot)+1][key] = value
  return dicts

Straight-forward solution: 简单的解决方案:

tmp = testboth, testabove, testbelow = {}, {}, {}
for k, v in test.items():
    tmp[(v[0] > 85 < v[1]) - (v[0] < 85 > v[1])][k] = v

It's also faster than Bhargav's solution, judging by tests with random input of appropriate size. 它也比Bhargav的解决方案要快,通过使用适当大小的随机输入进行测试来判断。 Test and results: 测试结果:

from random import *
test = {key: [randrange(170), randrange(170)] for key in sample(range(10000000), 9500)}
from timeit import timeit
print 'Bhargav', timeit(lambda: Bhargav(), number=100)
print 'Stefan', timeit(lambda: Stefan(), number=100)

Bhargav 1.87454111948
Stefan 1.2636884789

Two more variations, not sure what I like best. 还有两个变化,不确定我最喜欢什么。

testboth, testabove, testbelow = tmp = {}, {}, {}
for k, (a, b) in test.items():
    tmp[(a > 85 < b) - (a < 85 > b)][k] = [a, b]

for k, v in test.items():
    a, b = v
    tmp[(a > 85 < b) - (a < 85 > b)][k] = v

Timings for these are about 1.7 and 1.2 seconds. 这些时间约为1.7和1.2秒。

There are two basic options. 有两个基本选项。 One if you need an iterable and one if you need a more permanent data structure. 一个如果需要可迭代,则一个,如果需要更永久的数据结构。 The more permanent solution is to initialize all three target dictionaries, then iterate through your source dict and sort them where appropriate. 更为永久的解决方案是初始化所有三个目标字典,然后遍历源字典并在适当的地方对它们进行排序。

target_dicts = {'aboveabove':{}, 'belowbelow':{}, 'belowabove':{}}
for k,v in src_dict.items():
    first = 'above' if v[0] > 85 else 'below'
    second = 'above' if v[1] > 85 else 'below'
    result = first+second  # 'aboveabove', 'belowbelow', etc...
    if result in target_dicts:
        target_dicts[result][k] = v

This will populate your target_dicts dictionaries appropriately. 这将适当地填充您的target_dicts词典。 But maybe you don't need to use them all? 但是也许您不需要全部使用它们? You might just need an iterator, rather than actually rebuilding those in memory. 您可能只需要一个迭代器,而不是实际在内存中重建它们。 Let's use a filter instead! 让我们使用过滤器吧!

target_iterators = {
    'aboveabove': filter(
        lambda k: all(v > 85 for v in src_dict[k]), src_dict),
    'belowbelow': filter(
        lambda k: all(v <= 85 for v in src_dict[k]), src_dict),
    'belowabove': filter(
        lambda k: src_dict[k][0] <= 85 and src_dict[k][1] > 85, src_dict)}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM