简体   繁体   English

如何使用熊猫在ndarray上创建索引

[英]How to create an index on a ndarray using pandas

I'm really intrigued by the indexing of axis provided by pandas. 我对熊猫提供的轴索引很感兴趣。 I've worked with numpy lately and have an array, that keeps the position (XYZ) for a number of Particles (1 ... N) for a number of times (0.0 ... T). 我最近使用numpy并有一个数组,该数组将许多粒子(1 ... N)的位置(XYZ)保留多次(0.0 ... T)。 So that would be a three dimensional (T,N,3) array. 因此,这将是一个三维(T,N,3)数组。

D = random((10,20,3))

now I'd like to add the pandas indexing to the appropriate axis to make it easier to access certain time frames, or certain selection of atoms. 现在,我想将熊猫索引添加到适当的轴上,以便更轻松地访问某些时间范围或某些原子选择。 Let's say I'd like to attach the following index labels to the data: 假设我要在数据上附加以下索引标签:

T_index = arange( 10, dtype='f' )
N_index = arange( 20 )
P_index = ["x","y","z"]

I've looked around but have not found a good way of adding those to a pandas dataframe in a convinient way. 我环顾四周,但没有找到一种方便地将其添加到熊猫数据框的好方法。 I'm also not quite sure if the pandas dataframe is really the data structure I should be using, because maybe it brakes up the originally nicely formed numpy ndarray into something where the convenient numpy functions like mean() or sum() would be much slower. 我也不太确定pandas数据框是否真的是我应该使用的数据结构,因为也许它会将最初形式良好的numpy ndarray制动为类似mean()或sum()这样方便的numpy函数的东西。慢点。

Since you have 3 axes, defining a Panel might be most convenient: 由于您有3个轴,因此定义面板可能是最方便的:

pan = pd.Panel(D, items=T_index, major_axis=N_index, minor_axis=P_index)
# <class 'pandas.core.panel.Panel'>
# Dimensions: 10 (items) x 20 (major_axis) x 3 (minor_axis)
# Items axis: 0.0 to 9.0
# Major_axis axis: 0 to 19
# Minor_axis axis: x to z

Then, if you wish to convert that to a DataFrame, use: 然后,如果您希望将其转换为DataFrame,请使用:

df = pan.to_frame()

The underlying data in pan is still in one numpy array of shape (10, 20, 3): pan的基础数据仍然是一个numpy形状的数组(10、20、3):

In [50]: pan._data
BlockManager
...
FloatBlock: [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0], 10 x 20 x 3, dtype: float64

So I wouldn't expect there to be any significant deterioration in speed. 因此,我不希望速度有任何明显的下降。 And you could always drop back to numpy operations on the numpy array pan.values if need be, though, hopefully, that would be unnecessary. 而且,如果需要的话,您总是可以退回到numpy数组pan.values上的numpy操作,但是,这是不必要的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM