简体   繁体   English

根据条件在python中具有不同值范围的三个列表中选择最佳索引

[英]choosing the best index according to condition among three lists with different range of values in python

I have a dict with three keys which consists of a list with same length. 我有一个字典,其中包含三个具有相同长度的列表的键。 For example, the key 'a' has a list with a length of 5 and consists values ranging from 0 to 6000. Similarly, key 'b' with a length of 5 has values ranging from 0 to 1.0. 例如,键“ a”具有长度为5的列表,并且包含从0到6000范围内的值。类似地,键“ b”的长度为5的值介于0到1.0之间。 Finally key 'c' with the same length has values ranging from (1x1) to (2000x2000). 最后,具有相同长度的键“ c”的值范围为(1x1)至(2000x2000)。

I have to select an index between 0 and 4 on the condition that values of 'a' cannot be lower than 200. Values of 'b' cannot be lower than 0.95. 我必须在'a'的值不能低于200的条件下选择0到4之间的索引。'b'的值不能低于0.95。 Then, choose the highest value of 'c' among the indices that meet these two conditions. 然后,在满足这两个条件的索引中选择“ c”的最大值。

A dummy data would be as follows, 虚拟数据如下

  index     a          b           c
    0      600       0.99      (100x105)
    1      150        1.0       (50x40)
    2      820       0.75      (500x480)
    3      500       0.96      (200x190)
    4      400       0.97      (120x110)

Here, according to the two conditions i can filter the indices to 0, 3 and 4. Among these three the biggest value of 'c' is of the index 3. So the answer is 3 500 0.96 (200x190) 在这里,根据这两个条件,我可以将索引过滤为0、3和4。在这三个条件中,“ c”的最大值是索引3。因此答案是3 500 0.96 (200x190)

How do i select this in the most efficient way? 如何以最有效的方式选择它? I think i might need to use pandas. 我想我可能需要使用熊猫。 How can i do it using pandas? 我该如何使用熊猫呢? Also, how to do it in the most pythonic way? 另外,如何以最pythonic的方式做到这一点?

I am relatively new to coding. 我是编码的新手。 I am having a hard time figuring it out. 我很难弄清楚。

edit: a code snippet of the dict 编辑:字典的代码片段

{
'a' : [600, 150, 820, 500, 400]
'b' : [0.99, 1.0, 0.75, 0.96, 0.97]
'c' : [(100,105), (50,40), (500,480), (200,190), (120,110)]
}

This is relatively straightforward with numpy , although the slightly odd format of column c provides an interesting twist. 对于numpy ,这是相对简单的,尽管列c奇数格式提供了一个有趣的转折。

import numpy as np

d = {
'a' : [600, 150, 820, 500, 400],
'b' : [0.99, 1.0, 0.75, 0.96, 0.97],
'c' : [(100,105), (50,40), (500,480), (200,190), (120,110)]
}

# Load as numpy arrays. 
d_np = {key: np.array(value) for key, value in d.items()}

# Create logical mask based on given requirements
mask = np.logical_and(d_np['a'] > 200, d_np['b'] > 0.95)

# Multiply 'c' along dimension 1
c_product = np.prod(d_np['c'], axis=1)

# Get index of maximum value. Note that this index is relative to masked array.
max_index_masked = np.argmax(c_product[mask])

# Get original 'c' value. Need to mask the array so that our indexing works.
max_value = d_np['c'][mask][max_index_masked]

# Get index relative to unmasked array
index = np.arange(d_np['c'].shape[0])[mask][max_index_masked]
print(index)

A simple solution without numpy, using list slicing 使用列表切片的无numpy的简单解决方案

    data = {
        'a' : [600, 150, 820, 500, 400],
        'b' : [0.99, 1.0, 0.75, 0.96, 0.97],
        'c' : [(100,105), (50,40), (500,480), (200,190), (120,110)]
    }
    select_a = [index_a for index_a in range(len(data['a'])) if data['a'][index_a] >=200]
    select_b = [index_b for index_b in select_a if data['b'][index_b]>=0.95]
    result = select_b[0]
    for index_c in select_b:
        if((data['c'][index_c][0]*data['c'][index_c][1])>(data['c'][result][0]*data['c'][result][1])):
            result = index_c
    print(result)
d = {
'a' : [600, 150, 820, 500, 400],
'b' : [0.99, 1.0, 0.75, 0.96, 0.97],
'c' : [(100,105), (50,40), (500,480), (200,190), (120,110)]
}

print(list(map(lambda x : x[0]*x[1], d['c'])).index(max([d_lists[2][0]*d_lists[2][1] for i, d_lists in enumerate(zip(d['a'], d['b'], d['c'])) if d_lists[0] > 200 and d_lists[1] > 0.9])))

The output is 3. 输出为3。

Here's the data you have: 这是您拥有的数据:

d = {'a':[600,150,820,500,400], 'b':[0.99,1.0,0.75,0.96,0.97], 'c':[(100,105),(50,40),(500,480),(200,190),(120,110)]}
a_thresh = 200
b_thresh = 0.95

This is one way of solving, making just one pass over the lists in the dictionary: 这是一种解决方法,仅使字典中的列表通过一遍即可:

from operator import mul

list_len = len(d['a'])
found_i = 0
for i in range(list_len):
    if ((d['a'][i]>=a_thresh) and (d['b'][i]>=b_thresh) and 
        (mul(*d['c'][i]) > mul(*d['c'][found_i]))):
        found_i = i
print (found_i)

Output: 输出:

3

You can do this without importing and using the mul() function, of course. 当然,您无需导入并使用mul()函数即可执行此操作。 It is only make the loop condition to appear a little compact. 只是使循环条件显得有些紧凑。 The mul() is just for multiplying the two parts of a tuple. mul()仅用于将元组的两个部分相乘。 To do this without mul() , search and replace (mul(*d['c'][3]) > mul(*d['c'][found_i])) with the longer expression ((d['c'][3][0]*d['c'][3][1]) > (d['c'][found_i][0]*d['c'][found_i][1])) 要在没有mul()情况下执行此操作,请搜索并替换(mul(*d['c'][3]) > mul(*d['c'][found_i]))与更长的表达式((d['c'][3][0]*d['c'][3][1]) > (d['c'][found_i][0]*d['c'][found_i][1]))

My attempt at a Numpy solution. 我尝试Numpy解决方案。 Tried to make it as readable as possible. 试图使其尽可能可读。

import numpy as np

d = {
    'a': [600, 150, 820, 500, 400],
    'b': [0.99, 1.0, 0.75, 0.96, 0.97],
    'c': [(100, 105), (50, 40), (500, 480), (200, 190), (120, 110)]
}

a = np.array([
    np.arange(len(d['a'])),
    d['a'],
    d['b'],
    np.prod(np.array(d['c']), axis=1)
])

a = a[:, a[1] >= 200]
a = a[:, a[2] >= .95]
a = a[:, np.argmax(a[3])]
index = int(a[0])

print('result:', d['a'][index], d['b'][index], d['c'][index])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM