[英]Return a pandas DataFrame when using pandas.DataFrame.mean
[英]Why does pandas.DataFrame.mean() work but pandas.DataFrame.std() does not over same data
我试图弄清楚为什么pandas.DataFrame.mean()函数可以对ndarrays的ndarray起作用,但是pandas.DataFrame.std()不能对相同的数据起作用。 以下是最小示例。
x = np.array([1,2,3])
y = np.array([4,5,6])
df = pd.DataFrame({"numpy": [x,y]})
df["numpy"].mean() #works as expected
Out[231]: array([ 2.5, 3.5, 4.5])
df["numpy"].std() #does not work as expected
Out[231]: TypeError: setting an array element with a sequence.
但是,如果我通过
df["numpy"].values.mean() #works as expected
Out[231]: array([ 2.5, 3.5, 4.5])
df["numpy"].values.std() #works as expected
Out[233]: array([ 1.5, 1.5, 1.5])
调试信息:
df["numpy"].dtype
Out[235]: dtype('O')
df["numpy"][0].dtype
Out[236]: dtype('int32')
df["numpy"].describe()
Out[237]:
count 2
unique 2
top [1, 2, 3]
freq 1
Name: numpy, dtype: object
df["numpy"]
Out[238]:
0 [1, 2, 3]
1 [4, 5, 6]
Name: numpy, dtype: object
假设您具有以下原始DF(在单元格中包含相同形状的numpy数组):
In [320]: df
Out[320]:
file numpy
0 x [1, 2, 3]
1 y [4, 5, 6]
将其转换为以下格式:
In [321]: d = pd.DataFrame(df['numpy'].values.tolist(), index=df['file'])
In [322]: d
Out[322]:
0 1 2
file
x 1 2 3
y 4 5 6
现在,您可以自由使用所有的Pandas / Numpy / Scipy功能:
In [323]: d.sum(axis=1)
Out[323]:
file
x 6
y 15
dtype: int64
In [324]: d.sum(axis=0)
Out[324]:
0 5
1 7
2 9
dtype: int64
In [325]: d.mean(axis=0)
Out[325]:
0 2.5
1 3.5
2 4.5
dtype: float64
In [327]: d.std(axis=0)
Out[327]:
0 2.12132
1 2.12132
2 2.12132
dtype: float64
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.