I'm trying to figure out why the pandas.DataFrame.mean() function works over a ndarray of ndarrays, but the pandas.DataFrame.std() does not over the same data. The following is a minimum example.
x = np.array([1,2,3])
y = np.array([4,5,6])
df = pd.DataFrame({"numpy": [x,y]})
df["numpy"].mean() #works as expected
Out[231]: array([ 2.5, 3.5, 4.5])
df["numpy"].std() #does not work as expected
Out[231]: TypeError: setting an array element with a sequence.
However, if I do it through
df["numpy"].values.mean() #works as expected
Out[231]: array([ 2.5, 3.5, 4.5])
df["numpy"].values.std() #works as expected
Out[233]: array([ 1.5, 1.5, 1.5])
Debug information:
df["numpy"].dtype
Out[235]: dtype('O')
df["numpy"][0].dtype
Out[236]: dtype('int32')
df["numpy"].describe()
Out[237]:
count 2
unique 2
top [1, 2, 3]
freq 1
Name: numpy, dtype: object
df["numpy"]
Out[238]:
0 [1, 2, 3]
1 [4, 5, 6]
Name: numpy, dtype: object
Assuming you have the following orginal DF (containing numpy arrays of the same shape in cells):
In [320]: df
Out[320]:
file numpy
0 x [1, 2, 3]
1 y [4, 5, 6]
Convert it to the following format:
In [321]: d = pd.DataFrame(df['numpy'].values.tolist(), index=df['file'])
In [322]: d
Out[322]:
0 1 2
file
x 1 2 3
y 4 5 6
Now you are free to use all the Pandas/Numpy/Scipy power:
In [323]: d.sum(axis=1)
Out[323]:
file
x 6
y 15
dtype: int64
In [324]: d.sum(axis=0)
Out[324]:
0 5
1 7
2 9
dtype: int64
In [325]: d.mean(axis=0)
Out[325]:
0 2.5
1 3.5
2 4.5
dtype: float64
In [327]: d.std(axis=0)
Out[327]:
0 2.12132
1 2.12132
2 2.12132
dtype: float64
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.