简体   繁体   English

在集合的索引 1 处的元素的集合列表中查找最大值

[英]find max value in a list of sets for an element at index 1 of sets

I have a list like this:我有一个这样的列表:

dummy_list = [(8, 'N'),
 (4, 'Y'),
 (1, 'N'),
 (1, 'Y'),
 (3, 'N'),
 (4, 'Y'),
 (3, 'N'),
 (2, 'Y'),
 (1, 'N'),
 (2, 'Y'),
 (1, 'N')]

and would like to get the biggest value in 1st column of the sets inside where value in the 2nd column is 'Y' .并希望在其中第二列中的值为'Y'的集合的第一列中获得最大值。

How do I do this as efficiently as possible?我如何尽可能有效地做到这一点?

You can use max function with generator expression.您可以将max函数与生成器表达式一起使用。

>>> dummy_list = [(8, 'N'),
...  (4, 'Y'),
...  (1, 'N'),
...  (1, 'Y'),
...  (3, 'N'),
...  (4, 'Y'),
...  (3, 'N'),
...  (2, 'Y'),
...  (1, 'N'),
...  (2, 'Y'),
...  (1, 'N')]
>>>
>>> max(first for first, second in dummy_list if second == 'Y')
4

You can use pandas for this as the data you have resembles a table.您可以为此使用 pandas,因为您拥有的数据类似于表格。

import pandas as pd

df = pd.DataFrame(dummy_list, columns = ["Col 1", "Col 2"]) 
val_y = df[df["Col 2"] == "Y"]
max_index = val_y["Col 1"].idxmax()

print(df.loc[max_index, :])

First you convert it into a pandas dataframe using pd.DataFrame and set the column name to Col 1 and Col 2 .首先,您使用pd.DataFrame将其转换为pandas数据框,并将列名设置为Col 1 and Col 2

Then you get all the rows inside the dataframe with Col 2 values equal to Y .然后,您将获得数据框中的所有行,其中Col 2值等于Y

Once you have this data, just select Col 1 and apply the idxmax function on it to get the index of the maximum value for that series.获得此数据后,只需选择Col 1并对其应用idxmax函数即可获取该系列最大值的索引。

You can then pass this index inside the loc function as the row and : (every) as the column to get the whole row.然后,您可以在loc函数中将此索引作为行传递,并将: (every)作为列传递以获取整行。

It can be compressed to two lines in this way,这样可以压缩成两行,

max_index = df[df["Col 2"] == "Y"]["Col 1"].idxmax()
df.loc[max_index, :]

Output -输出 -

Col 1    4
Col 2    Y
Name: 1, dtype: object
max([i[0] for i in dummy_list if i[1] == 'Y'])

max([i for i in dummy_list if i[1] == 'Y'])

output: (4, 'Y')

or或者


max(filter(lambda x: x[1] == 'Y', dummy_list))

output: (4, 'Y')

By passing a callback function to max to get a finer search, no further iterations are required.通过将回调函数传递给max以获得更精细的搜索,不需要进一步的迭代。

y_max = max(dummy_list, key=lambda p: (p[0], 'Y'))[0]
print(y_max)

By decoupling the pairs and classify them wrt to the Y , N values通过解耦对并将它们分类为YN

d = {}
for k, v in dummy_list:
    d.setdefault(v, []).append(k)

y_max = max(d['Y'])

By a zip -decoupling one can use a mask-like approach using itertools.compress通过zip解耦,可以使用类似掩码的方法,使用itertools.compress

values, flags = zip(*dummy_list)
y_max = max(it.compress(values, map('Y'.__eq__, flags)))
print(y_max)

A basic for -loop approach基本for循环方法

y_max = dummy_list[0][0]
for i, c in dummy_list:
    if c == 'Y':
        y_max = max(y_max, i)
print(y_max)

EDIT: benchmark results.编辑:基准测试结果。

Each data list is shuffle d before execution and ordered from fastest to slowest.每个数据列表在执行前都经过shuffle d,并从最快到最慢排序。 The functions tested are those given by the users and the given identifier (I hope) should make easy to recognize the right one.测试的功能是用户提供的功能,给定的标识符(我希望)应该很容易识别正确的。

Test repeated 100-times with data with 11 terms (original amount of data)使用 11 个术语的数据(原始数据量)重复 100 次测试

max_gen         ms: 8.184e-04
for_loop        ms: 1.033e-03
dict_classifier ms: 1.270e-03
zip_compress    ms: 1.326e-03
max_key         ms: 1.413e-03
max_filter      ms: 1.535e-03
pandas          ms: 7.405e-01

Test repeated 100-times with data with 110 terms (10 x more data)使用 110 个术语的数据重复 100 次测试(10 x 更多数据)

max_key         ms: 1.497e-03
zip_compress    ms: 7.703e-03
max_filter      ms: 8.644e-03
for_loop        ms: 9.669e-03
max_gen         ms: 9.842e-03
dict_classifier ms: 1.046e-02
pandas          ms: 7.745e-01

Test repeated 100-times with data with 110000 terms (10000 x more data)使用 110000 个术语(10000 x 更多数据)的数据重复 100 次测试

max_key         ms: 1.418e-03
max_gen         ms: 4.787e+00
max_filter      ms: 8.566e+00
dict_classifier ms: 9.116e+00
zip_compress    ms: 9.801e+00
for_loop        ms: 1.047e+01
pandas          ms: 2.614e+01

When increasing the amount of data the "performance classes" change but max_key seems to be not affected.当增加数据量时,“性能等级”会发生变化,但max_key似乎没有受到影响。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM