简体   繁体   中英

Pandas update column value based on values of groupby having multiple if else

I have a pandas data frame, where 3 columns X, Y, and Z are used for grouping. I want to update column B (or store it in a separate column) for each group based on the conditions shown in the code. But all I'm getting is nulls as the final outcome. I'm not sure what am I doing incorrectly

Below is the sample of the table (I have not taken all the cases, but I'm including them in the code):

enter image description here

group=df.groupby(['X','Y','Z'])
for a,b in group:
    if ((b.colA==2).all()):
        df['colB']=b.colB.max() 
    elif (((b.colA>2).all()) and (b.colB.max() >=2)):
        df['colB']=b.colB.max()
   elif (((b.ColC.str.isdigit()).all()) and ((b.ColC.str.len()==2).all())):
        df['colB']=b.ColC.str[0].max()
   elif (((b.ColC.str.isdigit()).all()) and ((b.ColC.str.len()>2).all())):
        df['ColB']=b.ColC.str[:-2].max()
   elif ((b.ColC.str[0].str.isdigit().all()) and (b.ColC.str.contains('[A-Z]').all()) and 
          (b.ColC.str[-1].str.isalpha().all())):
        df['colB']=b.ColC.str[:-1].astype(float).max()
   elif (b.ColC.str[0].str.isalpha().all() and b.ColC.str.contains('[0-9]').all()):
        df['ColB']=len(set(" ".join(re.findall("[A-Z]+", str(b.ColC)))))
    else:
        df['colB']=np.nan 

The main flaw in your code is that you set some value in the whole colB column, whereas it should be set only in rows from the current group.

To do your task the right way, define a function to be applied to each group:

def myFun(b):
    if (b.colA == 2).all():
        rv = b.colB.max()
    elif (b.colA > 2).all() and (b.colB.max() >= 2):
        rv = b.colB.max()
    elif (b.colC.str.isdigit()).all() and (b.colC.str.len() == 2).all():
        rv = b.colC.str[0].max()
    elif b.colC.str.isdigit().all() and (b.colC.str.len() > 2).all():
        rv = b.colC.str[:-2].max()
    elif b.colC.str[0].str.isdigit().all() and b.colC.str[-1].str.isalpha().all():
        rv = b.colC.str[:-1].astype(int).max()
    elif b.colC.str[1].str.isalpha().all() and b.colC.str.contains('[0-9]').all():
        rv = len(set("".join(b.colC.str.extract("([A-Z]+)")[0])))
    else:
        rv = np.nan
    return pd.Series(rv, index=b.index)

Another flaw is in your data. The last group ('J', 'K', 'L') will be processed by the first if path. In order to be processed by the fifth path, I put 0 in colA in this group, so that the source DataFrame contains:

   X  Y  Z  colA  colB colC
0  A  B  C     2     3  NaN
1  A  B  C     2     1  NaN
2  D  E  F     3     4  NaN
3  D  E  F     3     1  NaN
4  D  E  F     3     2  NaN
5  G  H  I     3     0   35
6  G  H  I     3     0   63
7  G  H  I     3     0   78
8  J  K  L     0     0   2H
9  J  K  L     0     0   5B

And to fill the result column, run:

df['Result'] = df.groupby(['X','Y','Z'], group_keys=False).apply(myFun)

The result is:

   X  Y  Z  colA  colB colC Result
0  A  B  C     2     3  NaN      3
1  A  B  C     2     1  NaN      3
2  D  E  F     3     4  NaN      4
3  D  E  F     3     1  NaN      4
4  D  E  F     3     2  NaN      4
5  G  H  I     3     0   35      7
6  G  H  I     3     0   63      7
7  G  H  I     3     0   78      7
8  J  K  L     0     0   2H      5
9  J  K  L     0     0   5B      5

Or, to place the result in colB , change the output column name in the above code.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM