Pandas数据框 - 配对存储桶中的行

Question

I have a dataframe that looks like this: 我有一个如下所示的数据框：

       bucket  type   v
0         0    X      14
1         1    X      10
2         1    Y      11
3         1    X      15
4         2    X      16
5         2    Y      9
6         2    Y      10
7         3    Y      20
8         3    X      18
9         3    Y      15
10        3    X      14

The desired output looks like this: 所需的输出如下所示：

       bucket  type   v    v_paired
0         1    X      14   nan      (no Y coming before it)
1         1    X      10   nan      (no Y coming before it)
2         1    Y      11   14 (highest X in bucket 1 before this row)
3         1    X      15   11   (lowest Y in bucket 1 before this row)

4         2    X      16   nan (no Y coming before it in the same bucket)
5         2    Y      9    16 (highest X in same bucket coming before)
6         2    Y      10   16 (highest X in same bucket coming before)

7         3    Y      20   nan (no X coming before it in the same bucket)
8         3    X      18   20  (single Y coming before it in same bucket)
9         3    Y      15   18 (single Y coming before it in same bucket)
10        3    X      14   15  (smallest Y coming before it in same bucket)

The goal is to construct the v_paired column, and the rules are as follows: 目标是构造v_paired列，规则如下：

Look for rows in the same bucket, coming before this one, that have opposite type(X vs Y), call these 'pair candidates' 在同一个桶中查找具有相反类型（X与Y）的行中的行，将这些行称为“对候选者”
If the current row is X, choose the min. 如果当前行是X，请选择min。 v out of the pair candidates to become v_paired for the current row, if the current row is Y, choose the max. 如果当前行为Y，则从对候选对中成为v_paired，选择最大值。 v out of the pair candidates to be the v_paired for the current row v中的候选对象是当前行的v_paired

Thanks in advance. 提前致谢。

Answer 1

I believe this should be done in a sequential manner... first group by bucket 我相信这应该以顺序的方式完成......首先按桶分组

groups = df.groupby('bucket', group_keys=False)

this function will be applied to each bucket group 此功能将应用于每个存储桶组

def func(group):
    y_value = None
    x_value = None
    result = []
    for _, (_, value_type, value) in group.iterrows():
        if value_type == 'X':
            x_value = max(filter(None,(x_value, value)))
            result.append(y_value)
        elif value_type == 'Y':
            y_value = min(filter(None,(y_value, value)))
            result.append(x_value)
    return pd.DataFrame(result)

df['v_paired'] = groups.apply(func)

hopefuly this will do the job 希望这能完成这项工作

Pandas数据框 - 配对存储桶中的行

问题描述

1 个解决方案

解决方案1
0 2015-04-16 21:52:55

Pandas数据框 - 配对存储桶中的行

问题描述

1 个解决方案

解决方案1 0 2015-04-16 21:52:55

解决方案1
0 2015-04-16 21:52:55