The output is more like the following SQL statement.
UPDATE table_A SET final=(cs+fhfa+sz)/3 WHERE cs IS NOT NULL AND fhfa IS NOT NULL AND sz IS NOT NULL;
Here cs+fhfa+sz are all individual columns in the sql table ( and in dataframe)
If I want to convert this SQL statement to pandas operation in python, this will be more like :
df['div_3'] = (df.cs+df.fhfa+df.sz) /3
df['final'] = df.loc[(df['cs'] != None) & (df['fhfa'] != None) & (df['sz'] != None) ] = df['div_3']
But this does not guarantee "corresponding values" being put finally. How to achieve this??
Do i really need to create another column div_3 with all the sum of 3 columns? Can this be done without another column creation?
Filter on pd.Series.notnull
and call mean
.
c = ['cs', 'fhfa', 'sz']
df['final'] = df[df[c].notnull().all(1)][c].mean(1)
IIUC:
df.loc[:, 'final'] = df.loc[df[['cs','fhfa','sz']].notnull().all(1), ['cs','fhfa','sz']].sum(1)/3
.all(1)
- is the same as .all(axis=1)
, which means - all values in each row must be True
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.