Dataframe summary math based on condition from another dataframe?

Question

I have what amounts to 3D data but can't install the Pandas recommended xarray package .

df_values

   | a    b    c
-----------------
0  | 5    9    2
1  | 6    9    5
2  | 1    6    8

df_condition

   | a    b    c
-----------------
0  | y    y    y
1  | y    n    y
2  | n    n    y

I know I can get the average of all values in df_values like this.

df_values.stack().mean()

Question... 👇
What is the simplest way to find the average of df_values where df_condition == "y" ?

Answer 1

IIUC Boolean mask

df[c.eq('y')].mean().mean()
6.5

Or you may want

df[c.eq('y')].sum().sum()/c.eq('y').sum().sum()
5.833333333333333

Answer 2

Assuming you wish to find the mean of all values where df_condition == 'y' :

res = np.nanmean(df_values[df_condition.eq('y')])  #5.833333333333333

Using NumPy is substantially cheaper than Pandas stack or where :

# Pandas 0.23.0, NumPy 1.14.3
n = 10**5
df_values = pd.concat([df_values]*n, ignore_index=True)
df_condition = pd.concat([df_condition]*n, ignore_index=True)

%timeit np.nanmean(df_values.values[df_condition.eq('y')])       # 32 ms
%timeit np.nanmean(df_values.where(df_condition == 'y').values)  # 88 ms
%timeit df_values[df_condition.eq('y')].stack().mean()           # 107 ms

Answer 3

You can get the mean of all values where the condition is 'y' with only pandas DataFrame and Series methods like below

df_values[df_condition.eq('y')].stack().mean()  # 5.833333333333333

or

df_values[df_condition == 'y'].stack().mean()  # 5.833333333333333

Is this simple? :)

Answer 4

Try:

np.nanmean(df.where(dfcon == 'y').values)

Output:

5.8333333333

Dataframe summary math based on condition from another dataframe?

Question

df_values

df_condition

4 answers

solution1
1 2018-12-28 02:45:44

solution2
1 ACCPTED 2018-12-28 02:51:03

solution3
1 2018-12-28 02:55:55

solution4
1 2018-12-28 03:20:33

Dataframe summary math based on condition from another dataframe?

Question

df_values

df_condition

4 answers

solution1 1 2018-12-28 02:45:44

solution2 1 ACCPTED 2018-12-28 02:51:03

solution3 1 2018-12-28 02:55:55

solution4 1 2018-12-28 03:20:33

solution1
1 2018-12-28 02:45:44

solution2
1 ACCPTED 2018-12-28 02:51:03

solution3
1 2018-12-28 02:55:55

solution4
1 2018-12-28 03:20:33