[英]Calculate mean numpy array in pandas DataFrame
My DataFrame consists of numpy arrays as:我的 DataFrame 由 numpy 数组组成:
col1 \
0 [[[0.878617777607, 0.712102459231, 0.652479557...
1 [[[0.0815294305642, 0.793893471424, 0.24718091...
2 [[[0.611498467162, 0.880551635123, 0.949764900...
col2 \
0 [[[0.390629506277, 0.0318899771374, 0.28308523...
1 [[[0.578710371447, 0.385239304185, 0.330119601...
2 [[[0.843661601339, 0.402833961663, 0.535083132...
col3
0 [[[0.162446865578, 0.165619948624, 0.622459063...
1 [[[0.859362904741, 0.415994003318, 0.706308170...
2 [[[0.0559589731135, 0.307840549475, 0.80023067...
How can I calculate the mean numpy array in this DataFrame?如何计算此 DataFrame 中的平均 numpy 数组? The result should be a numpy array that represents the mean of all numpy arrays inside my DataFrame.
结果应该是一个 numpy 数组,它表示我的 DataFrame 中所有 numpy 数组的平均值。
Code
代码
import numpy as np
import pandas as pd
df = pd.DataFrame({'col1': [np.random.rand(4,4,4) for i in range(3)],
'col2': [np.random.rand(4,4,4) for i in range(3)],
'col3': [np.random.rand(4,4,4) for i in range(3)]})
Expected Output (For the code above): A numpy array that represents the mean of all numpy arrays预期输出(对于上面的代码):一个 numpy 数组,表示所有 numpy 数组的平均值
array([[[ 0.44091592, 0.81509111, 0.94968265, 0.60255149],
[ 0.49263418, 0.69519008, 0.05023616, 0.67871942],
[ 0.72771491, 0.9593636 , 0.84673578, 0.43407915],
[ 0.5884133 , 0.63940507, 0.53364733, 0.51271129]],
[[ 0.55612852, 0.58847166, 0.37781843, 0.7693527 ],
[ 0.40610198, 0.05897461, 0.945253 , 0.66332715],
[ 0.74352406, 0.34969614, 0.50384616, 0.90582012],
[ 0.38734233, 0.85533348, 0.94869219, 0.2863428 ]],
[[ 0.81782769, 0.8856158 , 0.68744406, 0.76579709],
[ 0.05843924, 0.83090709, 0.99446694, 0.74937771],
[ 0.11898717, 0.38715644, 0.50348724, 0.41903257],
[ 0.21359555, 0.93407981, 0.20531033, 0.71017461]],
[[ 0.88758803, 0.40433699, 0.02888434, 0.91075114],
[ 0.84047283, 0.87119432, 0.14844659, 0.87643422],
[ 0.06412383, 0.60458874, 0.47277274, 0.12969607],
[ 0.31917517, 0.15647266, 0.89773897, 0.77962999]]])
I tried df.mean()
, but it returns Series([], dtype: float64)
我试过
df.mean()
,但它返回Series([], dtype: float64)
Also tried df.mean(axis=1).mean()
and it returns NaN
也试过
df.mean(axis=1).mean()
并返回NaN
UPDATE:更新:
A much simpler example一个更简单的例子
df = pd.DataFrame({'col1': [np.array([[1,3],[4,2]]), np.array([[1,1],[3,2]])],
'col2': [np.array([[1,3],[3,3]]), np.array([[2,3],[3,1]])]})
DataFrame数据框
Out[31]:
col1 col2
0 [[1, 3], [4, 2]] [[1, 3], [3, 3]]
1 [[1, 1], [3, 2]] [[2, 3], [3, 1]]
Expected output:预期输出:
In[42]: (df.iloc[0,0]+df.iloc[0,1]+df.iloc[1,0]+df.iloc[1,1])/4.
Out[42]:
array([[ 1.25, 2.5 ],
[ 3.25, 2. ]])
Sorry, I misunderstood your question earlier, please try this.对不起,我之前误解了你的问题,请试试这个。
df = pd.DataFrame({'col1': [np.array([[1.,3.],[4.,2.]]), np.array([[1.,1.],[3.,2.]])],
'col2': [np.array([[1.,3.],[3.,3.]]), np.array([[2.,3.],[3.,1.]])]})
print df
print np.expand_dims(df.as_matrix(), axis=1).mean()
I don't know why pandas is allergic to computing mean()
on a DataFrame, but here is a workaround:我不知道为什么 Pandas 对 DataFrame 上的计算
mean()
过敏,但这里有一个解决方法:
>>> df = pd.DataFrame({'col1': [np.array([[1,3],[4,2]]), np.array([[1,1],[3,2]])],
... 'col2': [np.array([[1,3],[3,3]]), np.array([[2,3],[3,1]])]})
>>> np.mean([df[col].mean() for col in df.columns], axis=0)
array([[1.25, 2.5 ],
[3.25, 2. ]])
Doing df.mean(axis=0).mean(axis=1)
throws an exception:执行
df.mean(axis=0).mean(axis=1)
会引发异常:
ValueError: If using all scalar values, you must pass an index
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.