简体   繁体   中英

Replace column of pandas multi-index DataFrame with another DataFrame

I have a pandas DataFrame like this:

import pandas as pd
import numpy as np

data1 = np.repeat(np.array(range(3), ndmin=2), 3, axis=0)
columns1 = pd.MultiIndex.from_tuples([('foo', 'a'), ('foo', 'b'), ('bar', 'c')])
df1 = pd.DataFrame(data1, columns=columns1)
print(df1)

  foo    bar
    a  b   c
0   0  1   2
1   0  1   2
2   0  1   2

And another one like this:

data2 = np.repeat(np.array(range(3, 5), ndmin=2), 3, axis=0)
columns2 = ['d', 'e']
df2 = pd.DataFrame(data2, columns=columns2)
print(df2)

   d  e
0  3  4
1  3  4
2  3  4

Now, I would like to replace 'bar' of df1 with df2, but the regular syntax of single-level indexing doesn't seem to work:

df1['bar'] = df2
print(df1)

  foo    bar
    a  b   c
0   0  1 NaN
1   0  1 NaN
2   0  1 NaN

When what I would like to get is:

  foo    bar
    a  b   d  e
0   0  1   3  4
1   0  1   3  4
2   0  1   3  4

I'm not sure if I'm missing something on the syntax or if this is related to the issues described here and here . Could someone explain why this doesn't work and how to get the desired outcome?

I'm using python 2.7 and pandas 0.24, if it makes a difference.

For lack of better alternative, I'm currently doing this:

df2.columns = pd.MultiIndex.from_product([['bar'], df2.columns])
df1.drop(columns='bar', level=0, inplace=True)
df1 = df1.join(df2)

Which gives the desired result. One needs to be cautious though if the order of columns is important, as this approach will likely change it.

Reading further the mentioned issues on Github, I think the reason the approach in the question doesn't work is indeed related to an inconsistency in the pandas API that hasn't been fixed yet.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM