简体   繁体   English

如何创建元素组并选择最大值?

[英]How to create groups of elements and choose the largest value?

Hi stackoverflow Community!嗨 stackoverflow 社区!

I have the set of data:我有一组数据:

0 A 0.000027769231 1 B 0.000030287440 0.628306 0.988151 1
0 A 0.000027479497 2 C 0.000035937793 0.581428 0.976041 1
1 B 0.000030287440 2 C 0.000035532483 0.516033 0.987388 1
4 D 0.000011085990 5 E 0.000008163211 0.577556 0.943583 1
4 D 0.000010787916 8 F 0.000008873166 0.531686 0.954017 1
5 E 0.000007865264 8 F 0.000008873166 0.691516 0.989945 1
311 G 0.000006216949 312 H 0.000002510852 0.829361 0.983148 1
326 M 0.000028129783 327 N 0.000011022112 0.843188 0.915627 1
326 M 0.000027462953 328 O 0.000002167529 1.742349 0.943267 1
326 M 0.000028024026 329 P 0.000005130416 1.263187 0.924010 1
326 M 0.000027630314 330 R 0.000002965539 1.668906 0.935518 1
326 M 0.000027721668 331 S 0.000002614498 1.851544 0.939051 1
326 M 0.000028129332 332 T 0.000003145471 1.742525 0.930186 1
327 N 0.000011020065 328 O 0.000002570277 2.473902 0.943474 1
327 N 0.000011028065 329 P 0.000005235456 1.447848 0.976569 1
327 N 0.000011032158 330 R 0.000003154471 2.303768 0.955479 1
327 N 0.000011025788 331 S 0.000002864823 2.038783 0.946972 1
327 N 0.000011064135 332 T 0.000003183160 1.213611 0.975056 1
328 O 0.000002505234 329 P 0.000005129224 1.549313 0.968629 1
328 O 0.000002452331 330 R 0.000002965465 2.328536 0.981076 1
329 P 0.000005147180 330 R 0.000003095314 2.803627 0.977268 1
329 P 0.000005208069 332 T 0.000003147536 2.658807 0.984912 1
330 R 0.000002967887 331 S 0.000002700052 1.208673 0.987825 1
330 R 0.000003110114 332 T 0.000003145140 2.428988 0.983747 1
331 S 0.000002853757 332 T 0.000003145464 1.551457 0.982276 1
366 I 0.000000326315 367 J 0.000000253986 1.410176 0.961879 1
366 I 0.000000327483 368 K 0.000000110327 1.236265 0.918510 1
366 I 0.000000326939 369 Q 0.000000165208 2.258098 0.907039 1
367 J 0.000000257330 368 K 0.000000113511 2.600934 0.907874 1
367 J 0.000000256872 369 Q 0.000000166861 1.102368 0.937099 1

In each row I have an unique pair of some elements that I indicated here as a letters.在每一行中,我都有一对独特的元素,我在这里用字母表示。 I want to create groups of these elements and choose the largest value from column 3 or 6 in each group.我想创建这些元素的组,并从每组的第 3 列或第 6 列中选择最大值。 For this dataset I should get 4 groups with elements and max value from column 3 or 6:对于这个数据集,我应该从第 3 列或第 6 列中获得 4 个包含元素和最大值的组:

A
B
C
maxval: C: 0.000035937793
D
E
F
maxval: D: 0.000011085990
G
H
maxval: G: 0.000006216949
M
N
O
P
R
S
T
maxval: M: 0.000028129783
I
J
K
Q
maxval: I: 0.000000326939

As you can notice, if in rows there are more than one the same element (eg A), values in column 3 (for A) are a little bit different.如您所见,如果行中有多个相同的元素(例如 A),则第 3 列(对于 A)中的值会略有不同。 However, we can make an assumption that A has the same value of column 3 in every cases.但是,我们可以假设 A 在每种情况下都具有相同的第 3 列值。

As an output I want to get three files:作为 output 我想获得三个文件:

  1. list of groups with maxval of column 3 or 6 maxval 为第 3 列或第 6 列的组列表
  2. list of elements with the largest value from column 3 or 6. I want also add 1 or 4 column for every elements:第 3 列或第 6 列中具有最大值的元素列表。我还想为每个元素添加 1 列或 4 列:
2 C
4 D
311 G
326 M
366 I
  1. list with other elements from every groups:列出来自每个组的其他元素:
0 A
1 B
5 E
8 F
312 H
327 N
328 O
329 P
330 R
331 S
332 T
367 J 
368 K
369 Q

I have no idea how to do such a case in Python.我不知道如何在 Python 中做这种情况。 Can anyone help me with some advices or parts of code?谁能帮我一些建议或部分代码?

I am not sure if I exactly answer what you want, some parts are unclear to me, but probably small adjustments can be easily made within the loop.我不确定我是否准确地回答了你想要的,有些部分我不清楚,但可能可以在循环内轻松进行小的调整。

With help of pandas and numpy ,pandasnumpy的帮助下,

import pandas as pd
import numpy as np

We can load the data我们可以加载数据

data = pd.read_csv("data.txt", sep=" ", header=None)

And define a function并定义一个 function

# https://stackoverflow.com/questions/39915402/combine-a-list-of-pairs-tuples
def make_equiv_classes(pairs):
    groups = {}
    for (x, y) in pairs:
        xset = groups.get(x, set([x]))
        yset = groups.get(y, set([y]))
        jset = xset | yset
        for z in jset:
            groups[z] = jset
    return set(map(tuple, groups.values()))

And create our classes并创建我们的课程

classes = make_equiv_classes( data.values[:,[1,4]] )

Then for each class然后对于每个 class

for cls in classes:
    max_cls = 0
    print(sorted(cls))

    sub_class = data.loc[data[1].isin(cls) | data[4].isin(cls)]
    max_class_value = np.max( sub_class.values[:,[2,5]] )
    
    subclass_argmax = np.argmax( np.max( sub_class.values[:,[2,5]], axis=1) )
    data_argmax = sub_class.iloc[subclass_argmax][0]
    
    first_letter = sub_class.iloc[subclass_argmax][1]
    second_letter = sub_class.iloc[subclass_argmax][4]

    print( "Max Class Value: {}".format(max_class_value))
    print( "Max Class Number: {}".format(data_argmax))
    print( "First letter: {}, Second Letter: {}".format(first_letter, second_letter))
    print( "\n")

it will print它会打印

['M', 'N', 'O', 'P', 'R', 'S', 'T']
Max Class Value: 2.8129783000000003e-05
Max Class Number: 326
First letter: M, Second Letter: N


['G', 'H']
Max Class Value: 6.216949e-06
Max Class Number: 311
First letter: G, Second Letter: H


['D', 'E', 'F']
Max Class Value: 1.108599e-05
Max Class Number: 4
First letter: D, Second Letter: E


['I', 'J', 'K', 'Q']
Max Class Value: 3.27483e-07
Max Class Number: 366
First letter: I, Second Letter: K


['A', 'B', 'C']
Max Class Value: 3.5937793e-05
Max Class Number: 0
First letter: A, Second Letter: C

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 numpy 的数组中选择特定列的值最大的行? - How to choose the row from a numpy's array for which a specific column's value is the largest? 如何让 Python 选择符合要求的最大随机值? - How do I make Python choose the largest random value which fits the requirements? python-如何找到最大的熊猫群 - python - How to find the largest groups with pandas 如何从PCollection Apache Beam Python创建N个元素组 - How to create groups of N elements from a PCollection Apache Beam Python 我怎样才能找到这个列表元素的最大增加值,然后返回相应的年份和它增加的实际值 - How can I find the largest increase in value of the elements of this list and then return the corresponding year and actual value it increased by 根据索引值获取groupby后最大的7组 - Acquire the largest 7 groups after groupby base on index value 如何为4d张量中的k个最大元素创建一个单张量张量? - How to create a one-hot tensor for k largest elements in 4d tensor? 如何返回列表中不同元素之间的最大值? - How can I return the largest value among different elements within a list? 如何编写一个函数,将列表中除最大值之外的所有元素复制到另一个列表中? Python 3.2 - How to write a function that copies all of the elements of a list except the largest value into another list? Python 3.2 如何创建一个包含列表中最大值的新列,该列小于现有列中的单元格值? - How to create a new column containing the largest value in a list that is smaller than cell value in an existing column?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM