简体   繁体   中英

Pandas: how to add data to a MultiIndex empty DataFrame?

I would like to use a MultiIndex DataFrame to easily select portions of the DataFrame. I created an empty DataFrame as follows:

mi = mindex = {'input':['a','b','c'],'optim':['pareto','alive']}
mi = pd.MultiIndex.from_tuples([(c,k) for c in mi.keys() for k in mi[c]])
mc = pd.MultiIndex(names=['Generation','Individual'],labels=[[],[]],levels=[[],[]])
population = pd.DataFrame(index=mi,columns=mc)

which seems to be good. However, I do not know how to insert a single data to start populating my DataFrame. I tried the followings:

population.loc[('optim','pareto'),(0,0)]=True

where I tried to define a new column double index (0,0) leading to a NotImplementedError . I also tried with (0,1), which gave a ValueError .

I tried also with no columns indexes:

population.loc[('optim','pareto')]=True

Which gave no error...but no change in the DataFrame either... Any help? Thanks in advance.

EDIT To clarify my question, once populated, my DataFrame should look like this:

Generation     1               2
Individual     1    2    3     4    5     6
input       a  1    1    2     ...
            b  1    2    2     ...
            c  1    1    2     ...
optim  pareto  True True False ...
        alive  True True False ...

EDIT 2 I found out that what I was doing works if I define my first column at the DataFrame creation. In particular with:

mc = pd.MultiIndex.from_tuples([(0,0)])

I get a first column full of nan and I can add data as I wanted to (also for new columns):

population.loc[('optim','pareto'),(0,1)]=True

I still do not know what is wrong with my first definition...

Even if I do not know why my initial definition was wrong, the following works as expected:

mi = {'input':['a','b','c'],'optim':['pareto','alive']}
mi = pd.MultiIndex.from_tuples([(c,k) for c in mi.keys() for k in mi[c]])
mc = pd.MultiIndex.from_tuples([(0,0)],names=['Generation','Individual'])
population = pd.DataFrame(index=mi,columns=mc)

It looks like the solution was to initialize the columns at the DataFrame creation (here with a (0,0) column). The created DataFrame is then:

Generation      0
Individual      0
input a       NaN
      b       NaN
      c       NaN
optim pareto  NaN
      alive   NaN

which can be then be populated adding values to the current column or new columns/rows.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM