简体   繁体   English

重采样多级索引,或沿矩阵/数组的第三维求平均值

[英]Resampling multilevel index, or averaging along third dimension of a matrix/array

I have gridded satellite data stored in a dataframe.我已经对存储在数据框中的卫星数据进行了网格化。 Normally, this dataframe gets sliced to make imshow plots on a day-by-day basis, which is trivial.通常情况下,这个数据框被切片以每天制作 imshow 图,这是微不足道的。 However, I would like to plots annual means of the data, which is where I am currently stuck.但是,我想绘制数据的年度平均值,这就是我目前陷入困境的地方。 The dataframe has a multi-level index (datetime, latitude coordinate) with columns making up the longitude coordinates.数据框有一个多级索引(日期时间、纬度坐标),其中的列构成经度坐标。

import pandas as pd, numpy as np

dates  = pd.date_range('20140101',periods=10,freq='1D')
others = np.arange(0,5)
index  = [(d,o) for o in others for d in dates]
index  = pd.MultiIndex.from_tuples(index, names=['DATES','LAT'])
data   = np.random.randint(0,20,(50,10))

df = pd.DataFrame(data=data,index=index,columns=np.arange(0,10))
df.columns.names = ['LON']

If I were using arrays I would normally stack them along the third dimension and then average on the third dimension.如果我使用数组,我通常会沿第三维堆叠它们,然后在第三维上取平均值。 eg例如

mat = np.ones( (5,10,1) )

# stack on day-by-day basis so lat/lon pairs sit on top of each other 
# on the third dimension
for heute in df.index.get_level_values(0).unique():
    tmp = df.xs(heute, level=0)

    mat = np.dstack( (mat,tmp.as_matrix()) )

ave = mat[:,:,1:].mean(axis=2)

While this would work, I suspect there is a method of doing this within Pandas.虽然这行得通,但我怀疑在 Pandas 中有一种方法可以做到这一点。 However, for this I do not know where to start.但是,为此我不知道从哪里开始。 I have played around with groupby and resample, but I have been unable to make those work.我玩过 groupby 和 resample,但我一直无法使这些工作。 Any help would be appreciated.任何帮助,将不胜感激。

Here we go:开始了:

import pandas as pd, numpy as np
pd.set_option('display.float_format',lambda x: '{:,.1f}'.format(x))
np.random.seed(1)

dates  = pd.date_range('20140101',periods=10,freq='1D')
others = np.arange(0,5)
index  = [(d,o) for o in others for d in dates]
index  = pd.MultiIndex.from_tuples(index, names=['DATES','LAT'])
data   = np.random.randint(0,20,(50,10))

df = pd.DataFrame(data=data,index=index,columns=np.arange(0,10))
df.columns.names = ['LON']

# answer 
df = df.stack()
df= df.groupby(level=['LAT','LON']).mean()
print df.unstack(level=['LON'])

which yields:产生:

LON    0    1    2    3    4    5    6    7    8    9
LAT                                                  
0    8.8  8.5 10.8  9.2  9.0 10.8  9.3  9.3  7.6  9.1
1   10.6  8.5 10.6 12.2  8.0  8.8  9.5 11.3 10.8  9.5
2   11.0 10.3  8.2 11.2  9.9  8.4 13.5  9.7  7.8  9.0
3    8.1  6.2  8.8 12.6 10.6  7.1  8.8  9.3 11.7 10.2
4    9.1 10.1  7.8  8.7  7.4  7.3 10.2 11.9  8.3 11.9

Whilst your array approach yields:虽然您的数组方法会产生:

[[  8.8   8.5  10.8   9.2   9.   10.8   9.3   9.3   7.6   9.1]
 [ 10.6   8.5  10.6  12.2   8.    8.8   9.5  11.3  10.8   9.5]
 [ 11.   10.3   8.2  11.2   9.9   8.4  13.5   9.7   7.8   9. ]
 [  8.1   6.2   8.8  12.6  10.6   7.1   8.8   9.3  11.7  10.2]
 [  9.1  10.1   7.8   8.7   7.4   7.3  10.2  11.9   8.3  11.9]]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何在图中沿第三维绘制多行矩阵? - How to plot multiple rows of a matrix along third dimension in a plot? 沿最后一维索引numpy nd数组 - Index numpy nd array along last dimension 沿维度的距离矩阵 - Distance matrix along a dimension Python中的生命线布尔索引与维度0上的索引数组不匹配; 维度为88,但相应的布尔维度为76 - Lifelines boolean index in Python did not match indexed array along dimension 0; dimension is 88 but corresponding boolean dimension is 76 python IndexError: boolean 索引与维度 0 上的索引数组不匹配; 尺寸为 32,但对应的 boolean 尺寸为 112 - python IndexError: boolean index did not match indexed array along dimension 0; dimension is 32 but corresponding boolean dimension is 112 Numpy 数组索引错误:IndexError:boolean 索引与维度 0 上的索引数组不匹配; 尺寸为 16 - Numpy Array Index Error: IndexError: boolean index did not match indexed array along dimension 0; dimension is 16 索引错误:布尔索引与维度 1 的索引数组不匹配; 维度为 3,但对应的布尔维度为 10 - IndexError: boolean index did not match indexed array along dimension 1; dimension is 3 but corresponding boolean dimension is 10 VisibleDeprecationWarning:布尔索引与维度1的索引数组不匹配; dimension是2但对应的boolean维度是1 - VisibleDeprecationWarning: boolean index did not match indexed array along dimension 1; dimension is 2 but corresponding boolean dimension is 1 使用值作为索引沿新维度折叠一个numpy数组 - Fold out a numpy array along a new dimension using values as index IndexError:boolean index与维度0的索引数组不匹配 - IndexError: boolean index did not match indexed array along dimension 0
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM