I have two DataFrames that are equally indexed, but each represents a different aspect of my full dataset.
For instance:
import pandas as pd
from datetime import date
df_price = pd.DataFrame(
index=pd.date_range(start=date(2021, 1, 1), end=date(2021, 1, 3), freq="D"),
columns=["A", "B", "C"],
data={"A": [1, 2, 3], "B": [4, 5, 6], "C": [7, 8, 9]}
)
df_quantity = pd.DataFrame(
index=pd.date_range(start=date(2021, 1, 1), end=date(2021, 1, 3), freq="D"),
columns=["A", "B", "C"],
data={"A": [9, 8, 7], "B": [6, 5, 4], "C": [3, 2, 1]}
)
What I want is the equivalent of doing this:
index = pd.MultiIndex.from_product([["A", "B", "C"], ["price", "quantity"]], names=["first", "second"])
df_total = pd.DataFrame(
index=pd.date_range(start=date(2021, 1, 1), end=date(2021, 1, 3), freq="D"),
columns=index,
data=[[1, 9, 4, 6, 7, 3], [2, 8, 5, 5, 8, 2], [3, 7, 6, 4, 9, 1]]
)
first A B C
second price quantity price quantity price quantity
2021-01-01 1 9 4 6 7 3
2021-01-02 2 8 5 5 8 2
2021-01-03 3 7 6 4 9 1
Any ideas? I have tried the common methods of join and merge, but all I could do is add the columns with suffixes.
One option:
(i) join
the two DataFrames
(ii) split column names on '_'
and because we want to use from_tuples
, map the sublists to tuples
(iii) use pd.MultiIndex
to convert the column to MultiIndex
(iv) sort column names to match the desired outcome
df_total = df_price.join(df_quantity, lsuffix='_price', rsuffix='_quantity')
df_total.columns = pd.MultiIndex.from_tuples(map(tuple, df_total.columns.str.split('_')))
df_total = df_total.reindex(df_total.columns.sort_values(), axis=1)
Output:
A B C
price quantity price quantity price quantity
2021-01-01 1 9 4 6 7 3
2021-01-02 2 8 5 5 8 2
2021-01-03 3 7 6 4 9 1
I looked over the concatenate()
method and found this answer . It led me to create a function to solve my problem. It can handle multiple DataFrame
objects in once
from typing import Dict
import pandas as pd
def in_merge(dfs_dict: Dict[str, pd.DataFrame]) -> pd.DataFrame:
return (
pd.concat(
dfs_dict,
axis=1,
names=["attribute", "item"],
)
.swaplevel(axis=1)
.sort_index(axis=1)
)
| | ('A', 'price') | ('A', 'quality') | ('A', 'quantity') | ('B', 'price') | ('B', 'quality') | ('B', 'quantity') | ('C', 'price') | ('C', 'quality') | ('C', 'quantity') |
|:--------------------|-----------------:|-------------------:|--------------------:|-----------------:|-------------------:|--------------------:|-----------------:|-------------------:|--------------------:|
| 2021-01-01 00:00:00 | 1 | 4 | 9 | 4 | 5 | 6 | 7 | 6 | 3 |
| 2021-01-02 00:00:00 | 2 | 4 | 8 | 5 | 5 | 5 | 8 | 6 | 2 |
| 2021-01-03 00:00:00 | 3 | 4 | 7 | 6 | 5 | 4 | 9 | 6 | 1 |
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.