I would like to ask how to join (or merge) multiple dataframes (arbitrary number) whose columns may have the same name. I know this has been asked several times, but could not find a clear answer in any of the questions I have looked at.
import pickle
import os
from posixpath import join
import numpy as np
import pandas as pd
import re
import pickle
np.random.seed(1)
n_cols = 3
col_names = ["Ci"] + ["C"+str(i) for i in range(n_cols)]
def get_random_df():
values = np.random.randint(0, 10, size=(4,n_cols))
index = np.arange(4).reshape([4,-1])
return pd.DataFrame(np.concatenate([index, values], axis=1), columns=col_names).set_index("Ci")
dfs = []
for i in range(3):
dfs.append(get_random_df())
print(dfs[0])
print(dfs[1])
with output:
C0 C1 C2
Ci
0 5 8 9
1 5 0 0
2 1 7 6
3 9 2 4
C0 C1 C2
Ci
0 5 2 4
1 2 4 7
2 7 9 1
3 7 0 6
If I try and join two dataframes per iteration:
# concanenate two per iteration
df = dfs[0]
for df_ in dfs[1:]:
df = df.join(df_, how="outer", rsuffix="_r")
print("** 1 **")
print(df)
the final dataframe has columns with the same name: for example, C0_r is repeated for each joined dataframe.
** 1 **
C0 C1 C2 C0_r C1_r C2_r C0_r C1_r C2_r
Ci
0 5 8 9 5 2 4 9 9 7
1 5 0 0 2 4 7 6 9 1
2 1 7 6 7 9 1 0 1 8
3 9 2 4 7 0 6 8 3 9
This could be easily solved by providing a different suffix per iteration. However, [the doc on join] says 1 " Efficiently join multiple DataFrame objects by index at once by passing a list.". If I try what follows:
# concatenate all at once
df = dfs[0].join(dfs[1:], how="outer")
# fails
# concatenate all at once
df = dfs[0].join(dfs[1:], how="outer", rsuffix="_r")
# fails
All steps fail due to duplicate columns:
Indexes have overlapping values: Index(['C0', 'C1', 'C2'], dtype='object')
Question : is there a way to join automatically multiple dataframes without explicitly providing a different suffix every time?
Wouldn't be more readable to display your data like this?
By adding this line of code at the end:
pd.concat([x for x in dfs], axis=1, keys=[f'DF{str(i+1)}' for i in range(len(dfs))])
#output
DF1 DF2 DF3
C0 C1 C2 C0 C1 C2 C0 C1 C2
Ci
0 5 8 9 5 2 4 9 9 7
1 5 0 0 2 4 7 6 9 1
2 1 7 6 7 9 1 0 1 8
3 9 2 4 7 0 6 8 3 9
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.