简体   繁体   中英

Pandas Dataframe from dict with MultiIndex columns

I just started using pandas today. I found a tutorial where I can create a table that appears like

foo  one  two   
bar    a    b  c
2      0    0  0
4      0    0  0
6      0    0  0

from the code

import numpy as np
import pandas as pd

arrays = [np.hstack([ ['one']*1, ['two']*2]), ['a', 'b', 'c']]
columns = pd.MultiIndex.from_arrays(arrays, names=['foo', 'bar'])
df = pd.DataFrame(np.zeros((3,3)), columns=columns, index=['2','4','6'])
print df

I am trying to repeat the same thing, but creating the dataframe with a dictionary.

d={'a':[0,0,0], 'b':[0,0,0], 'c':[0,0,0]}
dd = pd.DataFrame(d, columns=columns, index=['2','4','6'])
print dd

However I get

foo  one  two     
bar    a    b    c
2    NaN  NaN  NaN
4    NaN  NaN  NaN
6    NaN  NaN  NaN

Omitting columns=columns yields a dataframe as expected, but without the multiindexed columns. Any idea on how I can achieve these multiindexed columns in a dataframe created from a dictionary? The docs seem to only cover numpy arrays with multiindexing. I would use numpy, but I was running into problems creating arrays when not every row is of equal length. I was only getting a 1d numpy array. My data will mostly likely be strings if that affects anything.

If you pass a dict with keys 'a', 'b', 'c' , you're telling it the columns are named 'a', 'b' , and 'c' . But your columns aren't named that. If you're using a MultiIndex, your columns don't have a single name, but rather a tuple of names, one for each level. So you need to specify the data with the full tuple for each column:

d={('one', 'a'):[0,0,0], ('two', 'b'):[0,0,0], ('two', 'c'):[0,0,0]}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM