Pandas Multi-Index DataFrame 到 Numpy Ndarray

Question

I am trying to convert a multi-index pandas DataFrame into a numpy.ndarray .我正在尝试将多索引 pandas DataFrame转换为numpy.ndarray 。 The DataFrame is below:数据框如下：

               s1  s2   s3   s4
Action State                   
1      s1     0.0   0  0.8  0.2
       s2     0.1   0  0.9  0.0
2      s1     0.0   0  0.9  0.1
       s2     0.0   0  1.0  0.0

I would like the resulting numpy.ndarray to be the following with np.shape() = (2,2,4) :我希望生成的numpy.ndarray与np.shape() = (2,2,4)如下：

[[[ 0.0  0.0  0.8  0.2 ]
  [ 0.1  0.0  0.9  0.0 ]]

 [[ 0.0  0.0  0.9  0.1 ]
  [ 0.0  0.0  1.0  0.0]]]

I have tried df.as_matrix() but this returns:我试过df.as_matrix()但这返回：

 [[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]
  [ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]

How do I return a list of lists for the first level with each list representing an Action records.如何返回第一级的列表列表，每个列表代表一个Action记录。

Answer 1

You could use the following:您可以使用以下内容：

dim = len(df.index.get_level_values(0).unique())
result = df.values.reshape((dim1, dim1, df.shape[1]))
print(result)
[[[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]]

 [[ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]]

The first line just finds the number of groups that you want to groupby.第一行只是找到您想要分组的组数。

Why this (or groupby) is needed: as soon as you use .values , you lose the dimensionality of the MultiIndex from pandas.为什么需要这个（或 groupby）：一旦你使用.values ，你就会失去 Pandas 的 MultiIndex 维度。 So you need to re-pass that dimensionality to NumPy in some way.所以你需要以某种方式将该维度重新传递给 NumPy。

Answer 2

One way单程

In [151]: df.groupby(level=0).apply(lambda x: x.values.tolist()).values
Out[151]:
array([[[0.0, 0.0, 0.8, 0.2], 
        [0.1, 0.0, 0.9, 0.0]],
       [[0.0, 0.0, 0.9, 0.1],
        [0.0, 0.0, 1.0, 0.0]]], dtype=object)

Answer 3

Using Divakar's suggestion, np.reshape() worked:使用 Divakar 的建议， np.reshape()工作：

>>> print(P)

              s1  s2   s3   s4
Action State                   
1      s1     0.0   0  0.8  0.2
       s2     0.1   0  0.9  0.0
2      s1     0.0   0  0.9  0.1
       s2     0.0   0  1.0  0.0

>>> np.reshape(P,(2,2,-1))

[[[ 0.   0.   0.8  0.2]
  [ 0.1  0.   0.9  0. ]]

 [[ 0.   0.   0.9  0.1]
  [ 0.   0.   1.   0. ]]]

>>> np.shape(P)

(2, 2, 4)

Answer 4

Elaborating on Brad Solomon's answer , to get a sligthly more generic solution - indexes of different sizes and an unfixed number of indexes - one could do something like this:详细说明Brad Solomon 的答案，以获得一个稍微更通用的解决方案 - 不同大小的索引和不固定数量的索引 - 可以这样做：

def df_to_numpy(df):
    try:
        shape = [len(level) for level in df.index.levels]
    except AttributeError:
        shape = [len(df.index)]
    ncol = df.shape[-1]
    if ncol > 1:
        shape.append(ncol)
    return df.to_numpy().reshape(shape)

If df has missing sub-indexes reshape will not work.如果df缺少子索引，则reshape将不起作用。 One way to add them would be (maybe there are better solutions):添加它们的一种方法是（也许有更好的解决方案）：

def enforce_df_shape(df):
    try:
        ind = pd.MultiIndex.from_product([level.values for level in df.index.levels])
    except AttributeError:
        return df
    fulldf = pd.DataFrame(-1, columns=df.columns, index=ind)  # remove -1 to fill fulldf with nan
    fulldf.update(df)
    return fulldf

Answer 5

If you are just trying to pull out one column, say s1, and get an array with shape (2,2) you can use the .index.levshape like this:如果你只是想拉出一列，比如 s1，并得到一个形状为 (2,2) 的数组，你可以像这样使用.index.levshape ：

x = df.s1.to_numpy().reshape(df.index.levshape)

This will give you a (2,2) containing the value of s1.这将为您提供包含 s1 值的 (2,2)。

Pandas Multi-Index DataFrame 到 Numpy Ndarray

问题描述

5 个解决方案

解决方案1
5 已采纳 2017-09-06 20:06:17

解决方案2
1 2017-09-06 15:28:22

解决方案3
0 2017-09-06 20:15:55

解决方案4
0 2021-06-02 16:41:31

解决方案5
0 2022-12-20 04:41:07

Pandas Multi-Index DataFrame 到 Numpy Ndarray

问题描述

5 个解决方案

解决方案1 5 已采纳 2017-09-06 20:06:17

解决方案2 1 2017-09-06 15:28:22

解决方案3 0 2017-09-06 20:15:55

解决方案4 0 2021-06-02 16:41:31

解决方案5 0 2022-12-20 04:41:07

解决方案1
5 已采纳 2017-09-06 20:06:17

解决方案2
1 2017-09-06 15:28:22

解决方案3
0 2017-09-06 20:15:55

解决方案4
0 2021-06-02 16:41:31

解决方案5
0 2022-12-20 04:41:07