简体   繁体   中英

Sort values of a list in a dictionary

I have over 9000 data attributes in a dictionary. Simple version looks like this:

test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]}

They are gene id's with two values inside a list. What I want is to end up with multiple dictionaries that contain all the id's with values above or under a certain value. So in this example lets say I want to end up with three dictionaries, one contains the id's and values that are both below 85, the other both above 85 and the last with the first value under 85 and the second above 85. So I would end up with this:

testabove = { 892456: [88, 88]}  

and

testbelow = { 524292: [80, 80]}

and

testboth = { {1092268: [81, 90]} 

I have no idea how to figure this out.

It is simple to do that using a dictionary comprehension

>>> testabove = {i:j for i,j in test.items() if j[0]>85 and j[1] > 85}
>>> testbelow = {i:j for i,j in test.items() if j[0]<85 and j[1] < 85}
>>> testboth = {i:j for i,j in test.items() if i not in testabove and i not in testbelow}
>>> testabove
{892456: [88, 88]}
>>> testbelow
{524292: [80, 80]}
>>> testboth
{1092268: [81, 90]}

As Marein mentions below in comments , the other way to do it

>>> test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]}
>>> testabove = {i:j for i,j in test.items() if all(x>85 for x in j)}
>>> testbelow = {i:j for i,j in test.items() if all(x<85 for x in j)}
>>> testabove
{892456: [88, 88]}
>>> testbelow
{524292: [80, 80]}

This uses all function

Comparison

$ python -m timeit "test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]};testabove = {i:j for i,j in test.items() if all(x>85 for x in j)}"
100000 loops, best of 3: 2.29 usec per loop
$ python -m timeit "test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]};testabove = {i:j for i,j in test.items() if j[0]>85 and j[1] > 85}"
1000000 loops, best of 3: 0.99 usec per loop

As you can see, the straight forward way is faster than using all .

Here's another solution that should be faster for a large amount of data, because it builds all three dictionaries in one iteration.

def compare_both(xs,pivot):
 if xs[0] < pivot and xs[1] < pivot: return -1
 if xs[0] > pivot and xs[1] > pivot: return 1
 return 0

def sort_dict(d,pivot):
  dicts = [{},{},{}]
  for key,value in d.items():
    dicts[compare_both(value,pivot)+1][key] = value
  return dicts

Straight-forward solution:

tmp = testboth, testabove, testbelow = {}, {}, {}
for k, v in test.items():
    tmp[(v[0] > 85 < v[1]) - (v[0] < 85 > v[1])][k] = v

It's also faster than Bhargav's solution, judging by tests with random input of appropriate size. Test and results:

from random import *
test = {key: [randrange(170), randrange(170)] for key in sample(range(10000000), 9500)}
from timeit import timeit
print 'Bhargav', timeit(lambda: Bhargav(), number=100)
print 'Stefan', timeit(lambda: Stefan(), number=100)

Bhargav 1.87454111948
Stefan 1.2636884789

Two more variations, not sure what I like best.

testboth, testabove, testbelow = tmp = {}, {}, {}
for k, (a, b) in test.items():
    tmp[(a > 85 < b) - (a < 85 > b)][k] = [a, b]

for k, v in test.items():
    a, b = v
    tmp[(a > 85 < b) - (a < 85 > b)][k] = v

Timings for these are about 1.7 and 1.2 seconds.

There are two basic options. One if you need an iterable and one if you need a more permanent data structure. The more permanent solution is to initialize all three target dictionaries, then iterate through your source dict and sort them where appropriate.

target_dicts = {'aboveabove':{}, 'belowbelow':{}, 'belowabove':{}}
for k,v in src_dict.items():
    first = 'above' if v[0] > 85 else 'below'
    second = 'above' if v[1] > 85 else 'below'
    result = first+second  # 'aboveabove', 'belowbelow', etc...
    if result in target_dicts:
        target_dicts[result][k] = v

This will populate your target_dicts dictionaries appropriately. But maybe you don't need to use them all? You might just need an iterator, rather than actually rebuilding those in memory. Let's use a filter instead!

target_iterators = {
    'aboveabove': filter(
        lambda k: all(v > 85 for v in src_dict[k]), src_dict),
    'belowbelow': filter(
        lambda k: all(v <= 85 for v in src_dict[k]), src_dict),
    'belowabove': filter(
        lambda k: src_dict[k][0] <= 85 and src_dict[k][1] > 85, src_dict)}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM