对字典中列表的值进行排序

Question

我在字典中有9000多个数据属性。 简单版本如下所示：

test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]}

它们是基因ID，列表中有两个值。 我要结束的是多个字典，这些字典包含具有大于或小于某个特定值的所有id。 因此，在这个示例中，我想结束三个字典，一个字典包含的id和值都低于85，另一个字典包含的值都高于85，最后一个字典的第一个值低于85，第二个值高于85。最后这样：

testabove = { 892456: [88, 88]}

和

testbelow = { 524292: [80, 80]}

和

testboth = { {1092268: [81, 90]}

我不知道该如何解决。

Answer 1

使用字典理解很容易做到这一点

>>> testabove = {i:j for i,j in test.items() if j[0]>85 and j[1] > 85}
>>> testbelow = {i:j for i,j in test.items() if j[0]<85 and j[1] < 85}
>>> testboth = {i:j for i,j in test.items() if i not in testabove and i not in testbelow}
>>> testabove
{892456: [88, 88]}
>>> testbelow
{524292: [80, 80]}
>>> testboth
{1092268: [81, 90]}

正如Marein在下面的评论中提到的，另一种方式

>>> test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]}
>>> testabove = {i:j for i,j in test.items() if all(x>85 for x in j)}
>>> testbelow = {i:j for i,j in test.items() if all(x<85 for x in j)}
>>> testabove
{892456: [88, 88]}
>>> testbelow
{524292: [80, 80]}

这使用了all功能

比较方式

$ python -m timeit "test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]};testabove = {i:j for i,j in test.items() if all(x>85 for x in j)}"
100000 loops, best of 3: 2.29 usec per loop
$ python -m timeit "test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]};testabove = {i:j for i,j in test.items() if j[0]>85 and j[1] > 85}"
1000000 loops, best of 3: 0.99 usec per loop

如您所见，直接方法比使用all更快。

Answer 2

这是另一种对于大量数据应该更快的解决方案，因为它可以在一次迭代中构建所有三个字典。

def compare_both(xs,pivot):
 if xs[0] < pivot and xs[1] < pivot: return -1
 if xs[0] > pivot and xs[1] > pivot: return 1
 return 0

def sort_dict(d,pivot):
  dicts = [{},{},{}]
  for key,value in d.items():
    dicts[compare_both(value,pivot)+1][key] = value
  return dicts

Answer 3

简单的解决方案：

tmp = testboth, testabove, testbelow = {}, {}, {}
for k, v in test.items():
    tmp[(v[0] > 85 < v[1]) - (v[0] < 85 > v[1])][k] = v

它也比Bhargav的解决方案要快，通过使用适当大小的随机输入进行测试来判断。 测试结果：

from random import *
test = {key: [randrange(170), randrange(170)] for key in sample(range(10000000), 9500)}
from timeit import timeit
print 'Bhargav', timeit(lambda: Bhargav(), number=100)
print 'Stefan', timeit(lambda: Stefan(), number=100)

Bhargav 1.87454111948
Stefan 1.2636884789

还有两个变化，不确定我最喜欢什么。

testboth, testabove, testbelow = tmp = {}, {}, {}
for k, (a, b) in test.items():
    tmp[(a > 85 < b) - (a < 85 > b)][k] = [a, b]

for k, v in test.items():
    a, b = v
    tmp[(a > 85 < b) - (a < 85 > b)][k] = v

这些时间约为1.7和1.2秒。

Answer 4

有两个基本选项。 一个如果需要可迭代，则一个，如果需要更永久的数据结构。 更为永久的解决方案是初始化所有三个目标字典，然后遍历源字典并在适当的地方对它们进行排序。

target_dicts = {'aboveabove':{}, 'belowbelow':{}, 'belowabove':{}}
for k,v in src_dict.items():
    first = 'above' if v[0] > 85 else 'below'
    second = 'above' if v[1] > 85 else 'below'
    result = first+second  # 'aboveabove', 'belowbelow', etc...
    if result in target_dicts:
        target_dicts[result][k] = v

这将适当地填充您的target_dicts词典。 但是也许您不需要全部使用它们？ 您可能只需要一个迭代器，而不是实际在内存中重建它们。 让我们使用过滤器吧！

target_iterators = {
    'aboveabove': filter(
        lambda k: all(v > 85 for v in src_dict[k]), src_dict),
    'belowbelow': filter(
        lambda k: all(v <= 85 for v in src_dict[k]), src_dict),
    'belowabove': filter(
        lambda k: src_dict[k][0] <= 85 and src_dict[k][1] > 85, src_dict)}

对字典中列表的值进行排序

问题描述

4 个解决方案

解决方案1
5 2015-04-30 15:43:53

解决方案2
1 2015-04-30 16:29:34

解决方案3
1 2015-05-02 04:47:32

解决方案4
0 2015-04-30 16:24:35

对字典中列表的值进行排序

问题描述

4 个解决方案

解决方案1 5 2015-04-30 15:43:53

解决方案2 1 2015-04-30 16:29:34

解决方案3 1 2015-05-02 04:47:32

解决方案4 0 2015-04-30 16:24:35

解决方案1
5 2015-04-30 15:43:53

解决方案2
1 2015-04-30 16:29:34

解决方案3
1 2015-05-02 04:47:32

解决方案4
0 2015-04-30 16:24:35