Assume the following DataFrames
df1:
id data1
1 10
2 200
3 3000
4 40000
df2:
id1 id2 data2
1 2 210
1 3 3010
1 4 40010
2 3 3200
2 4 40200
3 4 43000
I want the new df3:
id1 id2 data2 data11 data12
1 2 210 10 200
1 3 3010 10 3000
1 4 40010 10 40000
2 3 3200 200 3000
2 4 40200 200 40000
3 4 43000 3000 40000
What is the correct way to achieve this in pandas?
Edit: Please not the specific data can be arbitrary. I chose this specific data just to show where everything comes from, but every data element has no correlation to any other data element.
Other dataframes examples, because the first one wasn't clear enough:
df4:
id data1
1 a
2 b
3 c
4 d
df5:
id1 id2 data2
1 2 e
1 3 f
1 4 g
2 3 h
2 4 i
3 4 j
I want the new df6:
id1 id2 data2 data11 data12
1 2 e a b
1 3 f a c
1 4 g a d
2 3 h b c
2 4 i b d
3 4 j c d
Edit2: Data11 and Data12 are simply a copy of data1
, with the corresponding id id1
or id2
1.First merge both dataframe using id1 and id column
2.rename data1 as data11
3. drop id column
4. Now merge df1 and df3 on id2 and id
df3 = pd.merge(df2,df1,left_on=['id1'],right_on=['id'],how='left')
df3.rename(columns={'data1':'data11'},inplace=True)
df3.drop('id',axis=1,inplace=True)
df3 = pd.merge(d3,df1,left_on=['id2'],right_on=['id'],how='left')
df3.rename(columns={'data1':'data12'},inplace=True)
df3.drop('id',axis=1,inplace=True)
I hope it would solve your problem
Try this:
# merge dataframes, first on id and id1 then on id2
df3 = pd.merge(df1, df2, left_on="id", right_on="id1", how="inner")
df3 = pd.merge(df1, df3, left_on="id", right_on="id2", how="inner")
# rename and reorder columns
cols = [ 'id1', 'id2', 'data2', 'data1_y', 'data1_x']
df3 = df3[cols]
new_cols = ["id1", "id2", "data2", "data11", "data12"]
df3.columns = new_cols
df3.sort_values("id1", inplace=True)
print(df3)
This prints out:
id1 id2 data2 data11 data12
0 1 2 210 10 200
1 1 3 3010 10 3000
2 1 4 40010 10 40000
3 2 3 3200 200 3000
4 2 4 40200 200 40000
5 3 4 43000 3000 40000
one of the solution to your problem is:
data1 = {'id' : [1,2,3,4],
'data1' : [10,200,3000,40000]}
data2 = {'id1' : [1,1,1,2,2,3],
'id2' : [2,3,4,3,4,4],
'data2' : [210,3010,40010,3200,40200,43000]}
df1 = pd.DataFrame(data1)
df2 = pd.DataFrame(data2)
df1:
id data1
1 10
2 200
3 3000
4 40000
df2:
id1 id2 data2
1 2 210
1 3 3010
1 4 40010
2 3 3200
2 4 40200
3 4 43000
df3 = df2.set_index('id1').join(df1.set_index('id'))
df3.index.names = ['id1']
df3.reset_index(inplace=True)
final = df3.set_index('id2').join(df1.set_index('id'), rsuffix='2')
final.index.names = ['id2']
final.reset_index(inplace=True)
final[['id1','id2','data2','data1','data12']].sort_values('id1')
output df:
id1 id2 data2 data1 data12
1 2 210 10 200
1 3 3010 10 3000
1 4 40010 10 40000
2 3 3200 200 3000
2 4 40200 200 40000
3 4 43000 3000 40000
I hope this will help you.
merge
in a for loop with range
and f-string
One way we can generalise this and to make it more easily expandable when having more than two dataframes, is to use list comprehension
and a for loop with range
.
After that we drop the duplicate column names:
dfs = [df2.merge(df1,
left_on=f'id{x+1}',
right_on='id',
how='left').rename(columns={'data1':f'data1{x+1}'}) for x in range(2)]
df = pd.concat(dfs, axis=1).drop('id', axis=1)
df = df.loc[:, ~df.columns.duplicated()]
Output
id1 id2 data2 data11 data12
0 1 2 210 10 200
1 1 3 3010 10 3000
2 1 4 40010 10 40000
3 2 3 3200 200 3000
4 2 4 40200 200 40000
5 3 4 43000 3000 40000
use two left hand merges on column id1 and id2 for dataframe df2
txt="""id,data1 1,a 2,b 3,c 4,d """
from io import StringIO
f = StringIO(txt)
df1 = pd.read_table(f,sep =',')
df1['id']=df1['id'].astype(int)
txt="""id1,id2,data2
1,2,e
1,3,f
1,4,g
2,3,h
2,4,i
3,4,j
"""
f = StringIO(txt)
df2 = pd.read_table(f,sep =',')
df2['id1']=df2['id1'].astype(int)
df2['id2']=df2['id2'].astype(int)
left_on='id1'
right_on='id'
suffix='_1'
df2=df2.merge(df1, how='left', left_on=left_on, right_on=right_on,
suffixes=("", suffix))
left_on='id2'
right_on='id'
suffix='_2'
df2=df2.merge(df1, how='left', left_on=left_on, right_on=right_on,
suffixes=("", suffix))
print(df2)
output
id1 id2 data2 id data1 id_2 data1_2
0 1 2 e 1 a 2 b
1 1 3 f 1 a 3 c
2 1 4 g 1 a 4 d
3 2 3 h 2 b 3 c
4 2 4 i 2 b 4 d
5 3 4 j 3 c 4 d
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.