[英]find max value in a list of sets for an element at index 1 of sets
I have a list like this:我有一个这样的列表:
dummy_list = [(8, 'N'),
(4, 'Y'),
(1, 'N'),
(1, 'Y'),
(3, 'N'),
(4, 'Y'),
(3, 'N'),
(2, 'Y'),
(1, 'N'),
(2, 'Y'),
(1, 'N')]
and would like to get the biggest value in 1st column of the sets inside where value in the 2nd column is 'Y'
.并希望在其中第二列中的值为
'Y'
的集合的第一列中获得最大值。
How do I do this as efficiently as possible?我如何尽可能有效地做到这一点?
You can use max
function with generator expression.您可以将
max
函数与生成器表达式一起使用。
>>> dummy_list = [(8, 'N'),
... (4, 'Y'),
... (1, 'N'),
... (1, 'Y'),
... (3, 'N'),
... (4, 'Y'),
... (3, 'N'),
... (2, 'Y'),
... (1, 'N'),
... (2, 'Y'),
... (1, 'N')]
>>>
>>> max(first for first, second in dummy_list if second == 'Y')
4
You can use pandas for this as the data you have resembles a table.您可以为此使用 pandas,因为您拥有的数据类似于表格。
import pandas as pd
df = pd.DataFrame(dummy_list, columns = ["Col 1", "Col 2"])
val_y = df[df["Col 2"] == "Y"]
max_index = val_y["Col 1"].idxmax()
print(df.loc[max_index, :])
First you convert it into a pandas
dataframe using pd.DataFrame
and set the column name to Col 1 and Col 2
.首先,您使用
pd.DataFrame
将其转换为pandas
数据框,并将列名设置为Col 1 and Col 2
。
Then you get all the rows inside the dataframe with Col 2
values equal to Y
.然后,您将获得数据框中的所有行,其中
Col 2
值等于Y
。
Once you have this data, just select Col 1
and apply the idxmax
function on it to get the index of the maximum value for that series.获得此数据后,只需选择
Col 1
并对其应用idxmax
函数即可获取该系列最大值的索引。
You can then pass this index inside the loc
function as the row and : (every)
as the column to get the whole row.然后,您可以在
loc
函数中将此索引作为行传递,并将: (every)
作为列传递以获取整行。
It can be compressed to two lines in this way,这样可以压缩成两行,
max_index = df[df["Col 2"] == "Y"]["Col 1"].idxmax()
df.loc[max_index, :]
Output -输出 -
Col 1 4
Col 2 Y
Name: 1, dtype: object
max([i[0] for i in dummy_list if i[1] == 'Y'])
max([i for i in dummy_list if i[1] == 'Y'])
output: (4, 'Y')
or或者
max(filter(lambda x: x[1] == 'Y', dummy_list))
output: (4, 'Y')
By passing a callback function to max
to get a finer search, no further iterations are required.通过将回调函数传递给
max
以获得更精细的搜索,不需要进一步的迭代。
y_max = max(dummy_list, key=lambda p: (p[0], 'Y'))[0]
print(y_max)
By decoupling the pairs and classify them wrt to the Y
, N
values通过解耦对并将它们分类为
Y
, N
值
d = {}
for k, v in dummy_list:
d.setdefault(v, []).append(k)
y_max = max(d['Y'])
By a zip
-decoupling one can use a mask-like approach using itertools.compress
通过
zip
解耦,可以使用类似掩码的方法,使用itertools.compress
values, flags = zip(*dummy_list)
y_max = max(it.compress(values, map('Y'.__eq__, flags)))
print(y_max)
A basic for
-loop approach基本
for
循环方法
y_max = dummy_list[0][0]
for i, c in dummy_list:
if c == 'Y':
y_max = max(y_max, i)
print(y_max)
EDIT: benchmark results.编辑:基准测试结果。
Each data list is shuffle
d before execution and ordered from fastest to slowest.每个数据列表在执行前都经过
shuffle
d,并从最快到最慢排序。 The functions tested are those given by the users and the given identifier (I hope) should make easy to recognize the right one.测试的功能是用户提供的功能,给定的标识符(我希望)应该很容易识别正确的。
Test repeated 100-times with data with 11 terms (original amount of data)使用 11 个术语的数据(原始数据量)重复 100 次测试
max_gen ms: 8.184e-04
for_loop ms: 1.033e-03
dict_classifier ms: 1.270e-03
zip_compress ms: 1.326e-03
max_key ms: 1.413e-03
max_filter ms: 1.535e-03
pandas ms: 7.405e-01
Test repeated 100-times with data with 110 terms (10 x more data)使用 110 个术语的数据重复 100 次测试(10 x 更多数据)
max_key ms: 1.497e-03
zip_compress ms: 7.703e-03
max_filter ms: 8.644e-03
for_loop ms: 9.669e-03
max_gen ms: 9.842e-03
dict_classifier ms: 1.046e-02
pandas ms: 7.745e-01
Test repeated 100-times with data with 110000 terms (10000 x more data)使用 110000 个术语(10000 x 更多数据)的数据重复 100 次测试
max_key ms: 1.418e-03
max_gen ms: 4.787e+00
max_filter ms: 8.566e+00
dict_classifier ms: 9.116e+00
zip_compress ms: 9.801e+00
for_loop ms: 1.047e+01
pandas ms: 2.614e+01
When increasing the amount of data the "performance classes" change but max_key
seems to be not affected.当增加数据量时,“性能等级”会发生变化,但
max_key
似乎没有受到影响。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.