简体   繁体   English

Pandas 计算行对的最小值

[英]Pandas calculate minimum for pairs of rows

I have the following dataframe:我有以下 dataframe:

data = {'col1': ['A', 'B', 'A', 'B', "A", "B"], 
        'col2': ["0", "2", "0", "1", "0", "0.5"]}  
df = pd.DataFrame.from_dict(data)

df 

   col1 col2
0   A   0
1   B   2
2   A   0
3   B   1
4   A   0
5   B   0.5

There are three pairs of rows (A,B).有三对行(A,B)。 For each pair I calculate the absolute difference of the numbers in col2.对于每一对,我计算 col2 中数字的绝对差。 My goal is to get the minimum absolute difference of the three pairs and the corresponding Index.我的目标是得到三对的最小绝对差和相应的索引。 In this case this is 0.5 respectively 4.在这种情况下,这是 0.5 和 4。

I already tried:我已经尝试过:

(df[df["col1"] == "A"]["col2"] - df[df["col1"] == "B"]["col2"]).abs().min()

But I got a problem with the index.但是我遇到了索引问题。

Does anyone have an idea?有人有想法吗? Thanks.谢谢。

Try with尝试

s = df.iloc[::-1].groupby(df.index//2).col2.diff().abs()
out = s.agg(['min','idxmin'])
Out[193]: 
min       0.5
idxmin    4.0
Name: col2, dtype: float64

I think you are looking for this:我想你正在寻找这个:

import numpy as np
df.loc[:,'col2'] = df.col2.astype(np.float)
df[(df.col2 == min(df[df.col2 > 0].col2)) ]

Correct?正确的?

In one line of code:在一行代码中:

df.col2.diff().abs().shift(-1)[::2].agg(['idxmin', 'min']).values.tolist()

It returns a list with:它返回一个列表:

  • the index of the row corresponding to the start of the sequence AB with the minimum absolute difference;与绝对差最小的序列AB的开头对应的行的索引;
  • the value of the absolute difference.绝对差值。

Here the output:这里是 output:

[4.0, 0.5]

To fix the indexing problem you asked about, use .loc with the comparison expression as the row indexer and 'col2' for the column indexer.要解决您询问的索引问题,请使用带有比较表达式的.loc作为行索引器,使用'col2'作为列索引器。 I've added astype to enable the math later.我添加了astype以便稍后启用数学。

>>> x = df.loc[df.col1=='A','col2'].astype(float)
>>> y = df.loc[df.col1=='B','col2'].astype(float)
>>> x
0    0.0
2    0.0
4    0.0
Name: col2, dtype: float64
>>> y
1    2.0
3    1.0
5    0.5
Name: col2, dtype: float64

To subtract the resultant DataFrames reindex the 'B' DataFrame like the 'A' DataFrame to ensure you can recover the original index of the 'A' row.要减去生成的 DataFrames 重新索引'B' DataFrame 就像'A' DataFrame 以确保您可以恢复'A'行的原始索引。

>>> z = x - y.reindex_like(x,method='bfill')
>>> z
0   -2.0
2   -1.0
4   -0.5
Name: col2, dtype: float64

Extract what you are looking for.提取您要查找的内容。

>>> z.abs().agg(['min', 'idxmin'])
min       0.5
idxmin    4.0
Name: col2, dtype: float64
>>>

Unfortunately it is not a one-liner.不幸的是,它不是单行的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM