Pandas update column value based on values of groupby having multiple if else

Question

I have a pandas data frame, where 3 columns X, Y, and Z are used for grouping. I want to update column B (or store it in a separate column) for each group based on the conditions shown in the code. But all I'm getting is nulls as the final outcome. I'm not sure what am I doing incorrectly

Below is the sample of the table (I have not taken all the cases, but I'm including them in the code):

enter image description here

group=df.groupby(['X','Y','Z'])
for a,b in group:
    if ((b.colA==2).all()):
        df['colB']=b.colB.max() 
    elif (((b.colA>2).all()) and (b.colB.max() >=2)):
        df['colB']=b.colB.max()
   elif (((b.ColC.str.isdigit()).all()) and ((b.ColC.str.len()==2).all())):
        df['colB']=b.ColC.str[0].max()
   elif (((b.ColC.str.isdigit()).all()) and ((b.ColC.str.len()>2).all())):
        df['ColB']=b.ColC.str[:-2].max()
   elif ((b.ColC.str[0].str.isdigit().all()) and (b.ColC.str.contains('[A-Z]').all()) and 
          (b.ColC.str[-1].str.isalpha().all())):
        df['colB']=b.ColC.str[:-1].astype(float).max()
   elif (b.ColC.str[0].str.isalpha().all() and b.ColC.str.contains('[0-9]').all()):
        df['ColB']=len(set(" ".join(re.findall("[A-Z]+", str(b.ColC)))))
    else:
        df['colB']=np.nan

Answer 1

The main flaw in your code is that you set some value in the whole colB column, whereas it should be set only in rows from the current group.

To do your task the right way, define a function to be applied to each group:

def myFun(b):
    if (b.colA == 2).all():
        rv = b.colB.max()
    elif (b.colA > 2).all() and (b.colB.max() >= 2):
        rv = b.colB.max()
    elif (b.colC.str.isdigit()).all() and (b.colC.str.len() == 2).all():
        rv = b.colC.str[0].max()
    elif b.colC.str.isdigit().all() and (b.colC.str.len() > 2).all():
        rv = b.colC.str[:-2].max()
    elif b.colC.str[0].str.isdigit().all() and b.colC.str[-1].str.isalpha().all():
        rv = b.colC.str[:-1].astype(int).max()
    elif b.colC.str[1].str.isalpha().all() and b.colC.str.contains('[0-9]').all():
        rv = len(set("".join(b.colC.str.extract("([A-Z]+)")[0])))
    else:
        rv = np.nan
    return pd.Series(rv, index=b.index)

Another flaw is in your data. The last group ('J', 'K', 'L') will be processed by the first if path. In order to be processed by the fifth path, I put 0 in colA in this group, so that the source DataFrame contains:

   X  Y  Z  colA  colB colC
0  A  B  C     2     3  NaN
1  A  B  C     2     1  NaN
2  D  E  F     3     4  NaN
3  D  E  F     3     1  NaN
4  D  E  F     3     2  NaN
5  G  H  I     3     0   35
6  G  H  I     3     0   63
7  G  H  I     3     0   78
8  J  K  L     0     0   2H
9  J  K  L     0     0   5B

And to fill the result column, run:

df['Result'] = df.groupby(['X','Y','Z'], group_keys=False).apply(myFun)

The result is:

   X  Y  Z  colA  colB colC Result
0  A  B  C     2     3  NaN      3
1  A  B  C     2     1  NaN      3
2  D  E  F     3     4  NaN      4
3  D  E  F     3     1  NaN      4
4  D  E  F     3     2  NaN      4
5  G  H  I     3     0   35      7
6  G  H  I     3     0   63      7
7  G  H  I     3     0   78      7
8  J  K  L     0     0   2H      5
9  J  K  L     0     0   5B      5

Or, to place the result in colB , change the output column name in the above code.

Pandas update column value based on values of groupby having multiple if else

Question

1 answers

solution1
0 ACCPTED 2021-03-17 19:03:12

Pandas update column value based on values of groupby having multiple if else

Question

1 answers

solution1 0 ACCPTED 2021-03-17 19:03:12

solution1
0 ACCPTED 2021-03-17 19:03:12