简体   繁体   中英

choosing the best index according to condition among three lists with different range of values in python

I have a dict with three keys which consists of a list with same length. For example, the key 'a' has a list with a length of 5 and consists values ranging from 0 to 6000. Similarly, key 'b' with a length of 5 has values ranging from 0 to 1.0. Finally key 'c' with the same length has values ranging from (1x1) to (2000x2000).

I have to select an index between 0 and 4 on the condition that values of 'a' cannot be lower than 200. Values of 'b' cannot be lower than 0.95. Then, choose the highest value of 'c' among the indices that meet these two conditions.

A dummy data would be as follows,

  index     a          b           c
    0      600       0.99      (100x105)
    1      150        1.0       (50x40)
    2      820       0.75      (500x480)
    3      500       0.96      (200x190)
    4      400       0.97      (120x110)

Here, according to the two conditions i can filter the indices to 0, 3 and 4. Among these three the biggest value of 'c' is of the index 3. So the answer is 3 500 0.96 (200x190)

How do i select this in the most efficient way? I think i might need to use pandas. How can i do it using pandas? Also, how to do it in the most pythonic way?

I am relatively new to coding. I am having a hard time figuring it out.

edit: a code snippet of the dict

{
'a' : [600, 150, 820, 500, 400]
'b' : [0.99, 1.0, 0.75, 0.96, 0.97]
'c' : [(100,105), (50,40), (500,480), (200,190), (120,110)]
}

This is relatively straightforward with numpy , although the slightly odd format of column c provides an interesting twist.

import numpy as np

d = {
'a' : [600, 150, 820, 500, 400],
'b' : [0.99, 1.0, 0.75, 0.96, 0.97],
'c' : [(100,105), (50,40), (500,480), (200,190), (120,110)]
}

# Load as numpy arrays. 
d_np = {key: np.array(value) for key, value in d.items()}

# Create logical mask based on given requirements
mask = np.logical_and(d_np['a'] > 200, d_np['b'] > 0.95)

# Multiply 'c' along dimension 1
c_product = np.prod(d_np['c'], axis=1)

# Get index of maximum value. Note that this index is relative to masked array.
max_index_masked = np.argmax(c_product[mask])

# Get original 'c' value. Need to mask the array so that our indexing works.
max_value = d_np['c'][mask][max_index_masked]

# Get index relative to unmasked array
index = np.arange(d_np['c'].shape[0])[mask][max_index_masked]
print(index)

A simple solution without numpy, using list slicing

    data = {
        'a' : [600, 150, 820, 500, 400],
        'b' : [0.99, 1.0, 0.75, 0.96, 0.97],
        'c' : [(100,105), (50,40), (500,480), (200,190), (120,110)]
    }
    select_a = [index_a for index_a in range(len(data['a'])) if data['a'][index_a] >=200]
    select_b = [index_b for index_b in select_a if data['b'][index_b]>=0.95]
    result = select_b[0]
    for index_c in select_b:
        if((data['c'][index_c][0]*data['c'][index_c][1])>(data['c'][result][0]*data['c'][result][1])):
            result = index_c
    print(result)
d = {
'a' : [600, 150, 820, 500, 400],
'b' : [0.99, 1.0, 0.75, 0.96, 0.97],
'c' : [(100,105), (50,40), (500,480), (200,190), (120,110)]
}

print(list(map(lambda x : x[0]*x[1], d['c'])).index(max([d_lists[2][0]*d_lists[2][1] for i, d_lists in enumerate(zip(d['a'], d['b'], d['c'])) if d_lists[0] > 200 and d_lists[1] > 0.9])))

The output is 3.

Here's the data you have:

d = {'a':[600,150,820,500,400], 'b':[0.99,1.0,0.75,0.96,0.97], 'c':[(100,105),(50,40),(500,480),(200,190),(120,110)]}
a_thresh = 200
b_thresh = 0.95

This is one way of solving, making just one pass over the lists in the dictionary:

from operator import mul

list_len = len(d['a'])
found_i = 0
for i in range(list_len):
    if ((d['a'][i]>=a_thresh) and (d['b'][i]>=b_thresh) and 
        (mul(*d['c'][i]) > mul(*d['c'][found_i]))):
        found_i = i
print (found_i)

Output:

3

You can do this without importing and using the mul() function, of course. It is only make the loop condition to appear a little compact. The mul() is just for multiplying the two parts of a tuple. To do this without mul() , search and replace (mul(*d['c'][3]) > mul(*d['c'][found_i])) with the longer expression ((d['c'][3][0]*d['c'][3][1]) > (d['c'][found_i][0]*d['c'][found_i][1]))

My attempt at a Numpy solution. Tried to make it as readable as possible.

import numpy as np

d = {
    'a': [600, 150, 820, 500, 400],
    'b': [0.99, 1.0, 0.75, 0.96, 0.97],
    'c': [(100, 105), (50, 40), (500, 480), (200, 190), (120, 110)]
}

a = np.array([
    np.arange(len(d['a'])),
    d['a'],
    d['b'],
    np.prod(np.array(d['c']), axis=1)
])

a = a[:, a[1] >= 200]
a = a[:, a[2] >= .95]
a = a[:, np.argmax(a[3])]
index = int(a[0])

print('result:', d['a'][index], d['b'][index], d['c'][index])

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM