来自MultiIndex和NumPy结构化数组（recarray）的Pandas DataFrame

Question

First I create a two-level MultiIndex : 首先，我创建一个两级MultiIndex ：

import numpy as np
import pandas as pd

ind = pd.MultiIndex.from_product([('X','Y'), ('a','b')])

I can use it like this: 我可以这样使用它：

pd.DataFrame(np.zeros((3,4)), columns=ind)

Which gives: 这使：

     X         Y     
     a    b    a    b
0  0.0  0.0  0.0  0.0
1  0.0  0.0  0.0  0.0
2  0.0  0.0  0.0  0.0

But now I'm trying to do this: 但是现在我正在尝试这样做：

dtype = [('Xa','f8'), ('Xb','i4'), ('Ya','f8'), ('Yb','i4')]
pd.DataFrame(np.zeros(3, dtype), columns=ind)

But that gives: 但这给出了：

Empty DataFrame
Columns: [(X, a), (X, b), (Y, a), (Y, b)]
Index: []

I expected something like the previous result, with three rows. 我期望与之前的结果类似的结果，包含三行。

Perhaps more generally, what I want to do is to generate a Pandas DataFrame with MultiIndex columns where the columns have distinct types (as in the example, a is float but b is int). 也许更一般而言，我想做的是生成一个具有MultiIndex列的Pandas DataFrame，其中的列具有不同的类型（例如，在示例中， a为float但b为int）。

Answer 1

This looks like a bug, and worth reporting as an issue github . 这看起来像一个错误，值得作为问题github报告。

A workaround is to set the columns manually after construction: 一种解决方法是在构造后手动设置列：

In [11]: df1 = pd.DataFrame(np.zeros(3, dtype))

In [12]: df1.columns = ind

In [13]: df1
Out[13]:
     X       Y
     a  b    a  b
0  0.0  0  0.0  0
1  0.0  0  0.0  0
2  0.0  0  0.0  0

Answer 2

pd.DataFrame(np.zeros(3, dtype), columns=ind)

Empty DataFrame
Columns: [(X, a), (X, b), (Y, a), (Y, b)]
Index: []

is just showing the textual representation of the dataframe output. 只是显示数据帧输出的文本表示。

Columns: [(X, a), (X, b), (Y, a), (Y, b)]

is then just the text representation of the index. 然后就是索引的文本表示形式。

if you instead: 如果您改为：

df = pd.DataFrame(np.zeros(3, dtype), columns=ind)

print type(df.columns)

<class 'pandas.indexes.multi.MultiIndex'>

You see it is indeed a pd.MultiIndex 您看到它确实是一个pd.MultiIndex

That said and out of the way. 话虽这么说，但并不妨碍。 What I don't understand is why specifying the index in the dataframe constructor removes the values. 我不明白的是为什么在dataframe构造函数中指定索引会删除这些值。

A work around is this. 解决方法是这个。

df = pd.DataFrame(np.zeros(3, dtype))

df.columns = ind

print df

     X       Y   
     a  b    a  b
0  0.0  0  0.0  0
1  0.0  0  0.0  0
2  0.0  0  0.0  0

来自MultiIndex和NumPy结构化数组（recarray）的Pandas DataFrame

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-06-09 17:50:14

解决方案2
1 2016-06-09 17:51:19

来自MultiIndex和NumPy结构化数组（recarray）的Pandas DataFrame

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-06-09 17:50:14

解决方案2 1 2016-06-09 17:51:19

解决方案1
2 已采纳 2016-06-09 17:50:14

解决方案2
1 2016-06-09 17:51:19