[英]Dataframe merging in Pandas
我有兩個數據框。 第一個(df1)包含名稱,ID和PIN。 第二個包含標識符,城市和國家。 數據框如下所示。
df1 = pd.DataFrame({"Name": ["Sam", "Ajay", "Lee", "Lee Yong Dae", "Cai Yun"], "ID": ["S01", "A01", "L02", "L03", "C01"], "PIN": ["SM392", "AA09", "Lee101", "Lee201", "C101"]})
df2 = pd.DataFrame({"Identifier": ["Sam", "L02", "C101"], "City": ["Moscow", "Seoul", "Beijing"], "Country": ["Russia", "Korea", "China"]})
如果名稱或ID或PIN與df2的標識符匹配,我想合並數據幀。 預期輸出為: City Country Name PIN Student ID 0 Moscow Russia Sam SM392 S01 1 0 0 Ajay AA09 A01 2 Seoul Korea Lee Lee101 L02 3 0 0 Lee Yong Dae Lee201 L03 4 Beijing China Cai Yun C101 C01
這也許不是最優雅的解決方案,但對我有用。 您必須創建3個單獨的合並並合並結果。
下面的代碼給出了預期的輸出(對於DataFrame的不匹配元素,使用nan值而不是0)
import numpy as np
import pandas as pd
#Initial data
df1 = pd.DataFrame({"Name": ["Sam", "Ajay", "Lee", "Lee Yong Dae", "Cai Yun"], "ID": ["S01", "A01", "L02", "L03", "C01"], "PIN": ["SM392", "AA09", "Lee101", "Lee201","C101"]})
df2 = pd.DataFrame({"Identifier": ["Sam", "L02", "C101"], "City": ["Moscow", "Seoul", "Beijing"], "Country": ["Russia", "Korea", "China"]})
def merge_three(df1,df2):
#Perform three seperate merges
df3=df1.merge(df2, how='outer', left_on='ID', right_on='Identifier')
df4=df1.merge(df2, how='outer', left_on='Name', right_on='Identifier')
df5=df1.merge(df2, how='outer', left_on='PIN', right_on='Identifier')
#Copy 2nd and 3rd merge results to df3
df3['City_x']=df4['City']
df3['Country_x']=df4['Country']
df3['City_y']=df5['City']
df3['Country_y']=df5['Country']
#Merge the correct City and Country values. Use max to remove the NaN values
df6=df3[['City','Country','Name','PIN','ID']]
df6['City']=np.max([df3['City'],df3['City_x'],df3['City_y']],axis=0)
df6['Country']=np.max([df3['Country'],df3['Country_x'],df3['Country_y']],axis=0)
#Remove extra un-matched rows from merge
df_final=df6[df6['Name'].notnull()]
return df_final
df_out = merge_three(df1,df2)
輸出:
df_out
City Country Name PIN ID
0 Moscow Russia Sam SM392 S01
1 NaN NaN Ajay AA09 A01
2 Seoul Korea Lee Lee101 L02
3 NaN NaN Lee Yong Dae Lee201 L03
4 Beijing China Cai Yun C101 C01
不確定,但是也許這是您想要的:
a = df1.merge(df2, left_on='ID', right_on='Identifier')
b = df1.merge(df2, left_on='Name', right_on='Identifier')
с = df1.merge(df2, left_on='PIN', right_on='Identifier')
df = a.append(b).append(с)
df
ID Name PIN City Country Identifier
0 L02 Lee Lee101 Seoul Korea L02
0 S01 Sam SM392 Moscow Russia Sam
0 C01 Cai Yun C101 Beijing China C101
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.