简体   繁体   English

Pandas Multi-Index DataFrame 到 Numpy Ndarray

[英]Pandas Multi-Index DataFrame to Numpy Ndarray

I am trying to convert a multi-index pandas DataFrame into a numpy.ndarray .我正在尝试将多索引 pandas DataFrame转换为numpy.ndarray The DataFrame is below:数据框如下:

               s1  s2   s3   s4
Action State                   
1      s1     0.0   0  0.8  0.2
       s2     0.1   0  0.9  0.0
2      s1     0.0   0  0.9  0.1
       s2     0.0   0  1.0  0.0

I would like the resulting numpy.ndarray to be the following with np.shape() = (2,2,4) :我希望生成的numpy.ndarraynp.shape() = (2,2,4)如下:

[[[ 0.0  0.0  0.8  0.2 ]
  [ 0.1  0.0  0.9  0.0 ]]

 [[ 0.0  0.0  0.9  0.1 ]
  [ 0.0  0.0  1.0  0.0]]]

I have tried df.as_matrix() but this returns:我试过df.as_matrix()但这返回:

 [[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]
  [ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]

How do I return a list of lists for the first level with each list representing an Action records.如何返回第一级的列表列表,每个列表代表一个Action记录。

You could use the following:您可以使用以下内容:

dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]]

 [[ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]]

The first line just finds the number of groups that you want to groupby.第一行只是找到您想要分组的组数。

Why this (or groupby) is needed: as soon as you use .values , you lose the dimensionality of the MultiIndex from pandas.为什么需要这个(或 groupby):一旦你使用.values ,你就会失去 Pandas 的 MultiIndex 维度。 So you need to re-pass that dimensionality to NumPy in some way.所以你需要以某种方式将该维度重新传递给 NumPy。

One way单程

In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2], 
        [0.1, 0.0, 0.9, 0.0]],
       [[0.0, 0.0, 0.9, 0.1],
        [0.0, 0.0, 1.0, 0.0]]], dtype=object)

Using Divakar's suggestion, np.reshape() worked:使用 Divakar 的建议, np.reshape()工作:

>>> print(P)

              s1  s2   s3   s4
Action State                   
1      s1     0.0   0  0.8  0.2
       s2     0.1   0  0.9  0.0
2      s1     0.0   0  0.9  0.1
       s2     0.0   0  1.0  0.0

>>> np.reshape(P,(2,2,-1))

[[[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]]

 [[ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]]

>>> np.shape(P)

(2, 2, 4)

Elaborating on Brad Solomon's answer , to get a sligthly more generic solution - indexes of different sizes and an unfixed number of indexes - one could do something like this:详细说明Brad Solomon 的答案,以获得一个稍微更通用的解决方案 - 不同大小的索引和不固定数量的索引 - 可以这样做:

def df_to_numpy(df):
    try:
        shape = [len(level) for level in df.index.levels]
    except AttributeError:
        shape = [len(df.index)]
    ncol = df.shape[-1]
    if ncol > 1:
        shape.append(ncol)
    return df.to_numpy().reshape(shape)

If df has missing sub-indexes reshape will not work.如果df缺少子索引,则reshape将不起作用。 One way to add them would be (maybe there are better solutions):添加它们的一种方法是(也许有更好的解决方案):

def enforce_df_shape(df):
    try:
        ind = pd.MultiIndex.from_product([level.values for level in df.index.levels])
    except AttributeError:
        return df
    fulldf = pd.DataFrame(-1, columns=df.columns, index=ind)  # remove -1 to fill fulldf with nan
    fulldf.update(df)
    return fulldf

If you are just trying to pull out one column, say s1, and get an array with shape (2,2) you can use the .index.levshape like this:如果你只是想拉出一列,比如 s1,并得到一个形状为 (2,2) 的数组,你可以像这样使用.index.levshape

x = df.s1.to_numpy().reshape(df.index.levshape)

This will give you a (2,2) containing the value of s1.这将为您提供包含 s1 值的 (2,2)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM