简体   繁体   中英

Pandas Dataframe containing Numpy ndarray and mean

I have a Pandas dataframe containing Numpy ndarrays:

import numpy as np, pandas as pd
x = pd.DataFrame(columns=['a', 'b'])
x.loc['t1'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t2'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t3'] = [np.random.rand(2000, 500), np.random.rand(2000)]
print(x)
                                                    a                                                  b
# t1  [[0.8613174378493778, 0.5959214775442211, 0.62...  [0.4603835101674928, 0.3552761341266353, 0.949...
# t2  [[0.15792328922236398, 0.4274550633264813, 0.5...  [0.20059737978647396, 0.9445869962005252, 0.38...
# t3  [[0.43047697993868284, 0.7127140849172484, 0.4...  [0.6868215656323862, 0.14146376237438463, 0.51...

This works and computes the mean of the column b numpy arrays, over each of the 3 rows (vertical axis mean):

x.loc[:, 'b'].mean()
# [0.44926749 0.4804423  0.61566989 ... 0.4717142  0.70605732 0.55848075]

But how to compute the mean on the other axis? This fails:

x.loc[:, 'b'].mean(axis=1)   # or axis="b"

Expected result:

           b
t1         0.46
t2         0.31
t3         0.79

You could always apply a mean function on the column, creating a new column in x , like this:

import numpy as np, pandas as pd
x = pd.DataFrame(columns=['a', 'b'])
x.loc['t1'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t2'] = [np.random.rand(2000, 500), np.random.rand(2000)]
x.loc['t3'] = [np.random.rand(2000, 500), np.random.rand(2000)]

x["b_mean"] = x["b"].apply(lambda y: np.mean(y))
# or just:
x["b_mean"] = x["b"].apply(np.mean)

Which results in:

t1    0.506371
t2    0.501433
t3    0.493867
Name: b_mean, dtype: float64

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM