I have what amounts to 3D data but can't install the Pandas recommended xarray package .
| a b c
-----------------
0 | 5 9 2
1 | 6 9 5
2 | 1 6 8
| a b c
-----------------
0 | y y y
1 | y n y
2 | n n y
I know I can get the average of all values in df_values
like this.
df_values.stack().mean()
Question... 👇
What is the simplest way to find the average of df_values
where df_condition == "y"
?
IIUC Boolean mask
df[c.eq('y')].mean().mean()
6.5
Or you may want
df[c.eq('y')].sum().sum()/c.eq('y').sum().sum()
5.833333333333333
Assuming you wish to find the mean of all values where df_condition == 'y'
:
res = np.nanmean(df_values[df_condition.eq('y')]) #5.833333333333333
Using NumPy is substantially cheaper than Pandas stack
or where
:
# Pandas 0.23.0, NumPy 1.14.3
n = 10**5
df_values = pd.concat([df_values]*n, ignore_index=True)
df_condition = pd.concat([df_condition]*n, ignore_index=True)
%timeit np.nanmean(df_values.values[df_condition.eq('y')]) # 32 ms
%timeit np.nanmean(df_values.where(df_condition == 'y').values) # 88 ms
%timeit df_values[df_condition.eq('y')].stack().mean() # 107 ms
You can get the mean of all values where the condition is 'y' with only pandas DataFrame and Series methods like below
df_values[df_condition.eq('y')].stack().mean() # 5.833333333333333
or
df_values[df_condition == 'y'].stack().mean() # 5.833333333333333
Is this simple? :)
Try:
np.nanmean(df.where(dfcon == 'y').values)
Output:
5.8333333333
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.