简体   繁体   English

Pandas数据框 - 配对存储桶中的行

[英]Pandas dataframe - pairing off rows within a bucket

I have a dataframe that looks like this: 我有一个如下所示的数据框:

       bucket  type   v
0         0    X      14
1         1    X      10
2         1    Y      11
3         1    X      15
4         2    X      16
5         2    Y      9
6         2    Y      10
7         3    Y      20
8         3    X      18
9         3    Y      15
10        3    X      14

The desired output looks like this: 所需的输出如下所示:

       bucket  type   v    v_paired
0         1    X      14   nan      (no Y coming before it)
1         1    X      10   nan      (no Y coming before it)
2         1    Y      11   14 (highest X in bucket 1 before this row)
3         1    X      15   11   (lowest Y in bucket 1 before this row)

4         2    X      16   nan (no Y coming before it in the same bucket)
5         2    Y      9    16 (highest X in same bucket coming before)
6         2    Y      10   16 (highest X in same bucket coming before)

7         3    Y      20   nan (no X coming before it in the same bucket)
8         3    X      18   20  (single Y coming before it in same bucket)
9         3    Y      15   18 (single Y coming before it in same bucket)
10        3    X      14   15  (smallest Y coming before it in same bucket)

The goal is to construct the v_paired column, and the rules are as follows: 目标是构造v_paired列,规则如下:

  • Look for rows in the same bucket, coming before this one, that have opposite type(X vs Y), call these 'pair candidates' 在同一个桶中查找具有相反类型(X与Y)的行中的行,将这些行称为“对候选者”

  • If the current row is X, choose the min. 如果当前行是X,请选择min。 v out of the pair candidates to become v_paired for the current row, if the current row is Y, choose the max. 如果当前行为Y,则从对候选对中成为v_paired,选择最大值。 v out of the pair candidates to be the v_paired for the current row v中的候选对象是当前行的v_paired

Thanks in advance. 提前致谢。

I believe this should be done in a sequential manner... first group by bucket 我相信这应该以顺序的方式完成......首先按桶分组

groups = df.groupby('bucket', group_keys=False)

this function will be applied to each bucket group 此功能将应用于每个存储桶组

def func(group):
    y_value = None
    x_value = None
    result = []
    for _, (_, value_type, value) in group.iterrows():
        if value_type == 'X':
            x_value = max(filter(None,(x_value, value)))
            result.append(y_value)
        elif value_type == 'Y':
            y_value = min(filter(None,(y_value, value)))
            result.append(x_value)
    return pd.DataFrame(result)

df['v_paired'] = groups.apply(func)

hopefuly this will do the job 希望这能完成这项工作

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM