I have a pandas data frame, where 3 columns X, Y, and Z are used for grouping. I want to update column B (or store it in a separate column) for each group based on the conditions shown in the code. But all I'm getting is nulls as the final outcome. I'm not sure what am I doing incorrectly
Below is the sample of the table (I have not taken all the cases, but I'm including them in the code):
group=df.groupby(['X','Y','Z'])
for a,b in group:
if ((b.colA==2).all()):
df['colB']=b.colB.max()
elif (((b.colA>2).all()) and (b.colB.max() >=2)):
df['colB']=b.colB.max()
elif (((b.ColC.str.isdigit()).all()) and ((b.ColC.str.len()==2).all())):
df['colB']=b.ColC.str[0].max()
elif (((b.ColC.str.isdigit()).all()) and ((b.ColC.str.len()>2).all())):
df['ColB']=b.ColC.str[:-2].max()
elif ((b.ColC.str[0].str.isdigit().all()) and (b.ColC.str.contains('[A-Z]').all()) and
(b.ColC.str[-1].str.isalpha().all())):
df['colB']=b.ColC.str[:-1].astype(float).max()
elif (b.ColC.str[0].str.isalpha().all() and b.ColC.str.contains('[0-9]').all()):
df['ColB']=len(set(" ".join(re.findall("[A-Z]+", str(b.ColC)))))
else:
df['colB']=np.nan
The main flaw in your code is that you set some value in the whole colB column, whereas it should be set only in rows from the current group.
To do your task the right way, define a function to be applied to each group:
def myFun(b):
if (b.colA == 2).all():
rv = b.colB.max()
elif (b.colA > 2).all() and (b.colB.max() >= 2):
rv = b.colB.max()
elif (b.colC.str.isdigit()).all() and (b.colC.str.len() == 2).all():
rv = b.colC.str[0].max()
elif b.colC.str.isdigit().all() and (b.colC.str.len() > 2).all():
rv = b.colC.str[:-2].max()
elif b.colC.str[0].str.isdigit().all() and b.colC.str[-1].str.isalpha().all():
rv = b.colC.str[:-1].astype(int).max()
elif b.colC.str[1].str.isalpha().all() and b.colC.str.contains('[0-9]').all():
rv = len(set("".join(b.colC.str.extract("([A-Z]+)")[0])))
else:
rv = np.nan
return pd.Series(rv, index=b.index)
Another flaw is in your data. The last group ('J', 'K', 'L') will be processed by the first if path. In order to be processed by the fifth path, I put 0 in colA in this group, so that the source DataFrame contains:
X Y Z colA colB colC
0 A B C 2 3 NaN
1 A B C 2 1 NaN
2 D E F 3 4 NaN
3 D E F 3 1 NaN
4 D E F 3 2 NaN
5 G H I 3 0 35
6 G H I 3 0 63
7 G H I 3 0 78
8 J K L 0 0 2H
9 J K L 0 0 5B
And to fill the result column, run:
df['Result'] = df.groupby(['X','Y','Z'], group_keys=False).apply(myFun)
The result is:
X Y Z colA colB colC Result
0 A B C 2 3 NaN 3
1 A B C 2 1 NaN 3
2 D E F 3 4 NaN 4
3 D E F 3 1 NaN 4
4 D E F 3 2 NaN 4
5 G H I 3 0 35 7
6 G H I 3 0 63 7
7 G H I 3 0 78 7
8 J K L 0 0 2H 5
9 J K L 0 0 5B 5
Or, to place the result in colB , change the output column name in the above code.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.