I have over 9000 data attributes in a dictionary. Simple version looks like this:
test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]}
They are gene id's with two values inside a list. What I want is to end up with multiple dictionaries that contain all the id's with values above or under a certain value. So in this example lets say I want to end up with three dictionaries, one contains the id's and values that are both below 85, the other both above 85 and the last with the first value under 85 and the second above 85. So I would end up with this:
testabove = { 892456: [88, 88]}
and
testbelow = { 524292: [80, 80]}
and
testboth = { {1092268: [81, 90]}
I have no idea how to figure this out.
It is simple to do that using a dictionary comprehension
>>> testabove = {i:j for i,j in test.items() if j[0]>85 and j[1] > 85}
>>> testbelow = {i:j for i,j in test.items() if j[0]<85 and j[1] < 85}
>>> testboth = {i:j for i,j in test.items() if i not in testabove and i not in testbelow}
>>> testabove
{892456: [88, 88]}
>>> testbelow
{524292: [80, 80]}
>>> testboth
{1092268: [81, 90]}
As Marein mentions below in comments , the other way to do it
>>> test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]}
>>> testabove = {i:j for i,j in test.items() if all(x>85 for x in j)}
>>> testbelow = {i:j for i,j in test.items() if all(x<85 for x in j)}
>>> testabove
{892456: [88, 88]}
>>> testbelow
{524292: [80, 80]}
This uses all
function
Comparison
$ python -m timeit "test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]};testabove = {i:j for i,j in test.items() if all(x>85 for x in j)}"
100000 loops, best of 3: 2.29 usec per loop
$ python -m timeit "test = {1092268: [81, 90], 524292: [80, 80], 892456: [88, 88]};testabove = {i:j for i,j in test.items() if j[0]>85 and j[1] > 85}"
1000000 loops, best of 3: 0.99 usec per loop
As you can see, the straight forward way is faster than using all
.
Here's another solution that should be faster for a large amount of data, because it builds all three dictionaries in one iteration.
def compare_both(xs,pivot):
if xs[0] < pivot and xs[1] < pivot: return -1
if xs[0] > pivot and xs[1] > pivot: return 1
return 0
def sort_dict(d,pivot):
dicts = [{},{},{}]
for key,value in d.items():
dicts[compare_both(value,pivot)+1][key] = value
return dicts
Straight-forward solution:
tmp = testboth, testabove, testbelow = {}, {}, {}
for k, v in test.items():
tmp[(v[0] > 85 < v[1]) - (v[0] < 85 > v[1])][k] = v
It's also faster than Bhargav's solution, judging by tests with random input of appropriate size. Test and results:
from random import *
test = {key: [randrange(170), randrange(170)] for key in sample(range(10000000), 9500)}
from timeit import timeit
print 'Bhargav', timeit(lambda: Bhargav(), number=100)
print 'Stefan', timeit(lambda: Stefan(), number=100)
Bhargav 1.87454111948
Stefan 1.2636884789
Two more variations, not sure what I like best.
testboth, testabove, testbelow = tmp = {}, {}, {}
for k, (a, b) in test.items():
tmp[(a > 85 < b) - (a < 85 > b)][k] = [a, b]
for k, v in test.items():
a, b = v
tmp[(a > 85 < b) - (a < 85 > b)][k] = v
Timings for these are about 1.7 and 1.2 seconds.
There are two basic options. One if you need an iterable and one if you need a more permanent data structure. The more permanent solution is to initialize all three target dictionaries, then iterate through your source dict and sort them where appropriate.
target_dicts = {'aboveabove':{}, 'belowbelow':{}, 'belowabove':{}}
for k,v in src_dict.items():
first = 'above' if v[0] > 85 else 'below'
second = 'above' if v[1] > 85 else 'below'
result = first+second # 'aboveabove', 'belowbelow', etc...
if result in target_dicts:
target_dicts[result][k] = v
This will populate your target_dicts
dictionaries appropriately. But maybe you don't need to use them all? You might just need an iterator, rather than actually rebuilding those in memory. Let's use a filter instead!
target_iterators = {
'aboveabove': filter(
lambda k: all(v > 85 for v in src_dict[k]), src_dict),
'belowbelow': filter(
lambda k: all(v <= 85 for v in src_dict[k]), src_dict),
'belowabove': filter(
lambda k: src_dict[k][0] <= 85 and src_dict[k][1] > 85, src_dict)}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.