简体   繁体   中英

How to merge two pandas DataFrames into single Multi-Index DataFrame?

I have two DataFrames that are equally indexed, but each represents a different aspect of my full dataset.
For instance:

import pandas as pd
from datetime import date

df_price = pd.DataFrame(
    index=pd.date_range(start=date(2021, 1, 1), end=date(2021, 1, 3), freq="D"),
    columns=["A", "B", "C"],
    data={"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]}
)
df_quantity = pd.DataFrame(
    index=pd.date_range(start=date(2021, 1, 1), end=date(2021, 1, 3), freq="D"),
    columns=["A", "B", "C"],
    data={"A": [9, 8, 7], "B": [6, 5, 4], "C": [3, 2, 1]}
)

What I want is the equivalent of doing this:

index = pd.MultiIndex.from_product([["A", "B", "C"], ["price", "quantity"]], names=["first", "second"])
df_total = pd.DataFrame(
    index=pd.date_range(start=date(2021, 1, 1), end=date(2021, 1, 3), freq="D"),
    columns=index,
    data=[[1, 9, 4, 6, 7, 3], [2, 8, 5, 5, 8, 2], [3, 7, 6, 4, 9, 1]]
)
first          A              B              C         
second     price quantity price quantity price quantity
2021-01-01     1        9     4        6     7        3
2021-01-02     2        8     5        5     8        2
2021-01-03     3        7     6        4     9        1

Any ideas? I have tried the common methods of join and merge, but all I could do is add the columns with suffixes.

One option:

(i) join the two DataFrames

(ii) split column names on '_' and because we want to use from_tuples , map the sublists to tuples

(iii) use pd.MultiIndex to convert the column to MultiIndex

(iv) sort column names to match the desired outcome

df_total = df_price.join(df_quantity, lsuffix='_price', rsuffix='_quantity')
df_total.columns = pd.MultiIndex.from_tuples(map(tuple, df_total.columns.str.split('_')))
df_total = df_total.reindex(df_total.columns.sort_values(), axis=1)

Output:

               A              B              C         
           price quantity price quantity price quantity
2021-01-01     1        9     4        6     7        3
2021-01-02     2        8     5        5     8        2
2021-01-03     3        7     6        4     9        1

I looked over the concatenate() method and found this answer . It led me to create a function to solve my problem. It can handle multiple DataFrame objects in once

from typing import Dict
import pandas as pd

def in_merge(dfs_dict: Dict[str, pd.DataFrame]) -> pd.DataFrame:
    return (
        pd.concat(
            dfs_dict,
            axis=1,
            names=["attribute", "item"],
        )
        .swaplevel(axis=1)
        .sort_index(axis=1)
    )
|                     |   ('A', 'price') |   ('A', 'quality') |   ('A', 'quantity') |   ('B', 'price') |   ('B', 'quality') |   ('B', 'quantity') |   ('C', 'price') |   ('C', 'quality') |   ('C', 'quantity') |
|:--------------------|-----------------:|-------------------:|--------------------:|-----------------:|-------------------:|--------------------:|-----------------:|-------------------:|--------------------:|
| 2021-01-01 00:00:00 |                1 |                  4 |                   9 |                4 |                  5 |                   6 |                7 |                  6 |                   3 |
| 2021-01-02 00:00:00 |                2 |                  4 |                   8 |                5 |                  5 |                   5 |                8 |                  6 |                   2 |
| 2021-01-03 00:00:00 |                3 |                  4 |                   7 |                6 |                  5 |                   4 |                9 |                  6 |                   1 |

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM