简体   繁体   English

Python - 从元组列表中提取最小值/最大值

[英]Python - extract min/max value from list of tuples

I have a list of tuples as follows:我有一个元组列表如下:

data = [
    ('A', '59', '62'), ('A', '2', '6'), ('A', '87', '92'),
    ('A', '98', '104'), ('A', '111', '117'),
    ('B', '66', '71'), ('B', '25', '31'), ('B', '34', '40'), ('B', '46', '53'),
    ('B', '245', '251'), ('B', '235', '239'), ('B', '224', '229'), ('B', '135', '140'),
    ('C', '157', '162'), ('C', '203', '208'),
    ('D', '166', '173'), ('D', '176', '183'),
    ('E', '59', '62'), ('E', '2', '6'), ('E', '87', '92'), ('E', '98', '104'), ('E', '111', '117')
]

They correspond to a subset of a bigger data-set, so I extracted as above to simplify this post.它们对应于更大数据集的一个子集,因此我按上述方式提取以简化这篇文章。 The first element of each tuple ie A, B, C, D, E... is an identifier and can be present in multiple copies.每个元组的第一个元素,即 A, B, C, D, E... 是一个标识符,可以存在多个副本。

I would like to extract for each ID/category (A,B,C,D,E...):我想为每个 ID/类别(A、B、C、D、E...)提取:

1 - minimum from the 2nd element of the tuple 1 - 元组第二个元素的最小值

2 - maximum from the 3rd element of the tuple 2 - 元组第三个元素的最大值

The final output list should look like:最终输出列表应如下所示:

A: min = 2, max = 117
B: min = 25, max = 251
C: min = 157, max = 208
D: min = 166, max = 183
E: min = 2, max = 117

I tried an approach based on this post: How to remove duplicate from list of tuple when order is important我尝试了一种基于这篇文章的方法: 当顺序很重要时,如何从元组列表中删除重复项

I simplified for testing by using tuples with only the first 2 elements and extracting the minimum only.我通过使用仅包含前 2 个元素的元组并仅提取最小值来简化测试。

The output looks like this:输出如下所示:

('A', '111')
('B', '135')
('C', '157')
('D', '166')
('E', '111')

It should be:它应该是:

('A', '2')
('B', '25')
('C', '157')
('D', '166')
('E', '2')

I'm looking for an approach that would work with the complete "triple tuple" example, so as to avoid splitting data into multiple subsets.我正在寻找一种可以与完整的“三元组”示例一起使用的方法,以避免将数据拆分为多个子集。

Many thanks for your time.非常感谢您的时间。

EDIT 1 - 31/10/2018编辑 1 - 31/10/2018

Hello,你好,

please see my edit below that includes the code snippet not included earlier.请参阅下面我的编辑,其中包含之前未包含的代码片段。 This gives the erroneous minimum values in the preceding part of the post.这给出了帖子前面部分中错误的最小值。

data_min_only = [('A', '59'), ('A', '2'), ('A', '87'), ('A', '98'), ('A', '111'), ('B', '66'), ('B', '25'), ('B', '34'), ('B', '46'), ('B', '245'), ('B', '235'), ('B', '224'), ('B', '135'), ('C', '157'), ('C', '203'), ('D', '166'), ('D', '176'), ('E', '59'), ('E', '2'), ('E', '87'), ('E', '98'), ('E', '111')]

from collections import OrderedDict

empty_dict = OrderedDict()

for item in data_min_only:

    # Get old value in dictionary if exist
    old = empty_dict.get(item[0])

    # Skip if new item is larger than old
    if old:
        if item[1] > old[1]:
            continue
        else:
            del d[item[0]]

    # Assign
    empty_dict[item[0]] = item

list(empty_dict.values())

I was thinking that the order of the tuple values for each category was the problem (should be smallest to largest prior to iterating through data_min_only .我在想每个类别的元组值的顺序是问题所在(在遍历data_min_only之前应该从最小到最大。

Thank you to all posters for their prompt responses and suggestions/solutions!感谢所有发帖人的及时回复和建议/解决方案! I'm currently working through those to try and understand and adapt them further.我目前正在研究这些,以尝试理解和进一步调整它们。

EDIT 2 - 31/10/2018编辑 2 - 31/10/2018

I tweaked @slider suggestion to retrieve the differences between min and max.我调整了@slider 建议以检索最小值和最大值之间的差异。 I also tried to output that result to a list as below, but only the last result appears.我还尝试将该结果输出到如下列表中,但只显示最后一个结果。

for k, g in groupby(sorted(data), key=lambda x: x[0]):
    vals = [(int(t[1]), int(t[2])) for t in g]
    print (max(i[1] for i in vals) - min(i[0] for i in vals))
    test_lst = []
    test_lst.append((max(i[1] for i in vals) - min(i[0] for i in vals)))

I also tried this but got the same result:我也试过这个,但得到了相同的结果:

for i in vals:
    test_lst2 = []
    test_lst2.append((max(i[1] for i in vals) - min(i[0] for i in vals)))

For this kind of loop, what is the best way to extract the results to a list?对于这种循环,将结果提取到列表的最佳方法是什么?

Thanks again.再次感谢。

EDIT 3 - 31/10/2018编辑 3 - 31/10/2018

test_lst = []
for k, g in groupby(sorted(data), key=lambda x: x[0]):
    vals = [(int(t[1]), int(t[2])) for t in g]
    print (max(i[1] for i in vals) - min(i[0] for i in vals))
    test_lst.append((max(i[1] for i in vals) - min(i[0] for i in vals)))

Solution to extracting loop data - empty list should be outside the loop.提取循环数据的解决方案 - 空列表应该在循环之外。 Please see @slider comments for his post below.请参阅下面的@slider 评论以了解他的帖子。

You can use itertools.groupby to first group by the "id" key, and then compute the min and max for each group:您可以使用itertools.groupby首先按“id”键分组,然后计算每个组的最小值和最大值:

from itertools import groupby

groups = []
for k, g in groupby(sorted(data), key=lambda x: x[0]):
    groups.append(list(g))

for g in groups:
    print(g[0][0], 'min:', min(int(i[1]) for i in g), 'max:', max(int(i[2]) for i in g))

Output输出

A min: 2 max: 117
B min: 25 max: 251
C min: 157 max: 208
D min: 166 max: 183
E min: 2 max: 117

Note that you don't have to store the groups first in the groups list;请注意,您不必先将groups存储在groups列表中; you can directly print the min and max as you're iterating in the groupby for loop:您可以在groupby for 循环中迭代时直接打印最小值和最大值:

for k, g in groupby(sorted(data), key=lambda x: x[0]):
    vals = [(int(t[1]), int(t[2])) for t in g]
    print(k, 'min:', min(i[0] for i in vals), 'max:', max(i[1] for i in vals))
data = [('A', '59', '62'), ('A', '2', '6'), ('A', '87', '92'), ('A', '98', '104'), ('A', '111', '117'), ('B', '66', '71'), ('B', '25', '31'), ('B', '34', '40'), ('B', '46', '53'), ('B', '245', '251'), ('B', '235', '239'), ('B', '224', '229'), ('B', '135', '140'), ('C', '157', '162'), ('C', '203', '208'), ('D', '166', '173'), ('D', '176', '183'), ('E', '59', '62'), ('E', '2', '6'), ('E', '87', '92'), ('E', '98', '104'), ('E', '111', '117')]


result = {}  # construct result dictionary
for i in data:
    cur_min, cur_max = map(int, i[1:])
    min_i, max_i = result.setdefault(i[0], [cur_min, cur_max])
    if cur_min < min_i:
        result[i[0]][0] = cur_min
    if cur_max > max_i:
        result[i[0]][1] = cur_max
# print(result)  # dictionary containing keys with list of min and max values for given key >>> {'A': [2, 117], 'B': [25, 251], 'C': [157, 208], 'D': [166, 183], 'E': [2, 117]}

for k, v in result.items():  # loop to print output
    print("{} min: {} max: {}".format(k, v[0], v[1]))

Output:输出:

A min: 2 max: 117
B min: 25 max: 251
C min: 157 max: 208
D min: 166 max: 183
E min: 2 max: 117

Another approach:另一种方法:

max_list = {}
min_list = {}
for i in data:
    if i[0] not in max_list:
        max_list[i[0]] = -99999
        min_list[i[0]] = 99999

    if max_list[i[0]] < int(i[2]):
        max_list[i[0]] = int(i[2])

    if min_list[i[0]] > int(i[1]):
        min_list[i[0]] = int(i[1])



for ele in max_list:
    print(ele, ' min: ', min_list[ele], 'max: ', max_list[ele])

This an another approach that will work using the Pandas library:这是另一种使用 Pandas 库的方法:

import pandas as pd

#The same dataset you provided us
data = [('A', '59', '62'), ('A', '2', '6'), ('A', '87', '92'), ('A', '98', '104'), ('A', '111', '117'), ('B', '66', '71'), ('B', '25', '31'), ('B', '34', '40'), ('B', '46', '53'), ('B', '245', '251'), ('B', '235', '239'), ('B', '224', '229'), ('B', '135', '140'), ('C', '157', '162'), ('C', '203', '208'), ('D', '166', '173'), ('D', '176', '183'), ('E', '59', '62'), ('E', '2', '6'), ('E', '87', '92'), ('E', '98', '104'), ('E', '111', '117')]

#Generate dataframe df
df = pd.DataFrame(data=data)
#Convert strings to their respective numerical values
df[[1,2]] = df[[1,2]].apply(pd.to_numeric, errors='ignore')

#Group values using column 0
df.groupby(0).agg({1: min, 2: max})

We use the agg method with a dictionary as the argument in order to find the minimum in column 1 and the maximum in column 2 for each grouped range.我们使用带有字典作为参数的 agg 方法,以便为每个分组范围找到第 1 列中的最小值和第 2 列中的最大值。

This gives the following result:这给出了以下结果:

     1    2
0
A    2  117
B   25  251
C  157  208
D  166  183
E    2  117

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM