[英]Pandas DataFrame from MultiIndex and NumPy structured array (recarray)
First I create a two-level MultiIndex : 首先,我创建一个两级MultiIndex :
import numpy as np
import pandas as pd
ind = pd.MultiIndex.from_product([('X','Y'), ('a','b')])
I can use it like this: 我可以这样使用它:
pd.DataFrame(np.zeros((3,4)), columns=ind)
Which gives: 这使:
X Y
a b a b
0 0.0 0.0 0.0 0.0
1 0.0 0.0 0.0 0.0
2 0.0 0.0 0.0 0.0
But now I'm trying to do this: 但是现在我正在尝试这样做:
dtype = [('Xa','f8'), ('Xb','i4'), ('Ya','f8'), ('Yb','i4')]
pd.DataFrame(np.zeros(3, dtype), columns=ind)
But that gives: 但这给出了:
Empty DataFrame
Columns: [(X, a), (X, b), (Y, a), (Y, b)]
Index: []
I expected something like the previous result, with three rows. 我期望与之前的结果类似的结果,包含三行。
Perhaps more generally, what I want to do is to generate a Pandas DataFrame with MultiIndex columns where the columns have distinct types (as in the example, a
is float but b
is int). 也许更一般而言,我想做的是生成一个具有MultiIndex列的Pandas DataFrame,其中的列具有不同的类型(例如,在示例中, a
为float但b
为int)。
This looks like a bug, and worth reporting as an issue github . 这看起来像一个错误,值得作为问题github报告。
A workaround is to set the columns manually after construction: 一种解决方法是在构造后手动设置列:
In [11]: df1 = pd.DataFrame(np.zeros(3, dtype))
In [12]: df1.columns = ind
In [13]: df1
Out[13]:
X Y
a b a b
0 0.0 0 0.0 0
1 0.0 0 0.0 0
2 0.0 0 0.0 0
pd.DataFrame(np.zeros(3, dtype), columns=ind)
Empty DataFrame
Columns: [(X, a), (X, b), (Y, a), (Y, b)]
Index: []
is just showing the textual representation of the dataframe output. 只是显示数据帧输出的文本表示。
Columns: [(X, a), (X, b), (Y, a), (Y, b)]
is then just the text representation of the index. 然后就是索引的文本表示形式。
if you instead: 如果您改为:
df = pd.DataFrame(np.zeros(3, dtype), columns=ind)
print type(df.columns)
<class 'pandas.indexes.multi.MultiIndex'>
You see it is indeed a pd.MultiIndex
您看到它确实是一个pd.MultiIndex
That said and out of the way. 话虽这么说,但并不妨碍。 What I don't understand is why specifying the index in the dataframe constructor removes the values. 我不明白的是为什么在dataframe构造函数中指定索引会删除这些值。
A work around is this. 解决方法是这个。
df = pd.DataFrame(np.zeros(3, dtype))
df.columns = ind
print df
X Y
a b a b
0 0.0 0 0.0 0
1 0.0 0 0.0 0
2 0.0 0 0.0 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.