[英]Python Pandas: Merging data frames on multiple conditions
我希望合並在多個條件下通過sql獲取的數據幀。
df1和df2如下所示:
df1
Customer ID Cluster ID Customer Zone ID
CUS1001.A CUS1001.X CUS1000
CUS1001.B CUS1001.X CUS1000
CUS1001.C CUS1001.X CUS1000
CUS1001.D CUS1001.X CUS1000
CUS1001.E CUS1001.X CUS1000
CUS2001.A CUS2001.X CUS2000
df2:
Complain ID RegistrationNumber Status
CUS3501.A 99231 open
CUS1001.B 21340 open
CUS1001.X 32100 open
我希望在以下條件下合並這兩個數據框:
if(Complain ID == Customer ID):
Merge on Customer ID
Elif(Complain ID == Cluster ID):
Merge on Customer ID
Elif (Complain ID == Customer Zone ID):
Merge on Customer ID
Else:
Merge empty row.
最終結果應如下所示:
Customer ID Cluster ID Customer Zone ID Complain ID Regi ID Status
CUS1001.A CUS1001.X CUS1000 CUS1001.X 32100 open
CUS1001.B CUS1001.X CUS1000 CUS1001.B 21340 open
CUS1001.C CUS1001.X CUS1000 CUS1001.X 32100 open
. . . . . .
. . . . . .
CUS2001.A CUS2001.X CUS2000 0 0 0
請幫忙!
嘗試...使用pandas
: melt
, merge
和concat
df=pd.melt(df1)
df=df.merge(df2,left_on='value',right_on='Complain ID',how='left')
df['number']=df.groupby('variable').cumcount()
df=df.groupby('number').bfill()
Target=pd.concat([df1,df.iloc[:5,2:6]],axis=1).fillna(0).drop('number',axis=1)
Target
Out[39]:
Customer ID Cluster ID Customer Zone ID Complain ID RegistrationNumber \
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X 32100.0
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B 21340.0
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X 32100.0
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X 32100.0
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X 32100.0
5 CUS2001.A CUS2001.X CUS2000 0 0.0
Status
0 open
1 open
2 open
3 open
4 open
5 0
通過使用numpy的intersect1d
,我個人比以前更喜歡這種方法。
df1.MatchId=[np.intersect1d(x,df2.ComplainID.values) for x in df1[['CustomerID','ClusterID']].values]
df1.MatchId=df1.MatchId.apply(pd.Series)
df1
Out[307]:
CustomerID ClusterID CustomerZoneID MatchId
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN
df1.merge(df2,left_on='MatchId',right_on='ComplainID',how='left')
Out[311]:
CustomerID ClusterID CustomerZoneID MatchId ComplainID \
0 CUS1001.A CUS1001.X CUS1000 CUS1001.X CUS1001.X
1 CUS1001.B CUS1001.X CUS1000 CUS1001.B CUS1001.B
2 CUS1001.C CUS1001.X CUS1000 CUS1001.X CUS1001.X
3 CUS1001.D CUS1001.X CUS1000 CUS1001.X CUS1001.X
4 CUS1001.E CUS1001.X CUS1000 CUS1001.X CUS1001.X
5 CUS2001.A CUS2001.X CUS2000 NaN NaN
RegistrationNumber Status
0 32100.0 open
1 21340.0 open
2 32100.0 open
3 32100.0 open
4 32100.0 open
5 NaN NaN
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.