简体   繁体   English

Pandas 根据 groupby 的值更新列值,如果有多个 if else

[英]Pandas update column value based on values of groupby having multiple if else

I have a pandas data frame, where 3 columns X, Y, and Z are used for grouping.我有一个 pandas 数据框,其中 3 列 X、Y 和 Z 用于分组。 I want to update column B (or store it in a separate column) for each group based on the conditions shown in the code.我想根据代码中显示的条件为每个组更新 B 列(或将其存储在单独的列中)。 But all I'm getting is nulls as the final outcome.但我得到的只是最终结果为空。 I'm not sure what am I doing incorrectly我不确定我做错了什么

Below is the sample of the table (I have not taken all the cases, but I'm including them in the code):下面是表格示例(我没有记录所有案例,但我将它们包含在代码中):

enter image description here在此处输入图像描述

group=df.groupby(['X','Y','Z'])
for a,b in group:
    if ((b.colA==2).all()):
        df['colB']=b.colB.max() 
    elif (((b.colA>2).all()) and (b.colB.max() >=2)):
        df['colB']=b.colB.max()
   elif (((b.ColC.str.isdigit()).all()) and ((b.ColC.str.len()==2).all())):
        df['colB']=b.ColC.str[0].max()
   elif (((b.ColC.str.isdigit()).all()) and ((b.ColC.str.len()>2).all())):
        df['ColB']=b.ColC.str[:-2].max()
   elif ((b.ColC.str[0].str.isdigit().all()) and (b.ColC.str.contains('[A-Z]').all()) and 
          (b.ColC.str[-1].str.isalpha().all())):
        df['colB']=b.ColC.str[:-1].astype(float).max()
   elif (b.ColC.str[0].str.isalpha().all() and b.ColC.str.contains('[0-9]').all()):
        df['ColB']=len(set(" ".join(re.findall("[A-Z]+", str(b.ColC)))))
    else:
        df['colB']=np.nan 

The main flaw in your code is that you set some value in the whole colB column, whereas it should be set only in rows from the current group.您的代码中的主要缺陷是您在整个colB列中设置了一些值,而它应该只在当前组的行中设置。

To do your task the right way, define a function to be applied to each group:要以正确的方式完成任务,请定义要应用于每个组的 function:

def myFun(b):
    if (b.colA == 2).all():
        rv = b.colB.max()
    elif (b.colA > 2).all() and (b.colB.max() >= 2):
        rv = b.colB.max()
    elif (b.colC.str.isdigit()).all() and (b.colC.str.len() == 2).all():
        rv = b.colC.str[0].max()
    elif b.colC.str.isdigit().all() and (b.colC.str.len() > 2).all():
        rv = b.colC.str[:-2].max()
    elif b.colC.str[0].str.isdigit().all() and b.colC.str[-1].str.isalpha().all():
        rv = b.colC.str[:-1].astype(int).max()
    elif b.colC.str[1].str.isalpha().all() and b.colC.str.contains('[0-9]').all():
        rv = len(set("".join(b.colC.str.extract("([A-Z]+)")[0])))
    else:
        rv = np.nan
    return pd.Series(rv, index=b.index)

Another flaw is in your data.另一个缺陷是您的数据。 The last group ('J', 'K', 'L') will be processed by the first if path.最后一组('J'、'K'、'L')将由第一个if路径处理。 In order to be processed by the fifth path, I put 0 in colA in this group, so that the source DataFrame contains:为了被第五条路径处理,我在这个组的colA中放了0 ,这样source DataFrame包含:

   X  Y  Z  colA  colB colC
0  A  B  C     2     3  NaN
1  A  B  C     2     1  NaN
2  D  E  F     3     4  NaN
3  D  E  F     3     1  NaN
4  D  E  F     3     2  NaN
5  G  H  I     3     0   35
6  G  H  I     3     0   63
7  G  H  I     3     0   78
8  J  K  L     0     0   2H
9  J  K  L     0     0   5B

And to fill the result column, run:并填充结果列,运行:

df['Result'] = df.groupby(['X','Y','Z'], group_keys=False).apply(myFun)

The result is:结果是:

   X  Y  Z  colA  colB colC Result
0  A  B  C     2     3  NaN      3
1  A  B  C     2     1  NaN      3
2  D  E  F     3     4  NaN      4
3  D  E  F     3     1  NaN      4
4  D  E  F     3     2  NaN      4
5  G  H  I     3     0   35      7
6  G  H  I     3     0   63      7
7  G  H  I     3     0   78      7
8  J  K  L     0     0   2H      5
9  J  K  L     0     0   5B      5

Or, to place the result in colB , change the output column name in the above code.或者,要将结果放在colB中,请更改上述代码中的 output 列名。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM