![](/img/trans.png)
[英]How to concat dataframes but only where ID column value is both dataframes and delete the rows where IDs do not match?
[英]How to join two dataframes where IDs do not match and create new column to represent what dataframe ID came from?
我有兩個這樣的數據框
DF1:
id column1 column2
1 30 90
2 1 2
DF2:
id column1 column2
1 30 90
3 1 2
我想創建邏輯來合並ID不匹配(列名相同)的這兩個數據框,然后我想創建一個新列來說明ID來自何數據框。 我該怎么做?
最終合並的df:
id column1 column2 df_name
2 30 90 df1
3 1 2 df2
編輯:
最終df可以從兩個數據框中提取所有列嗎?
id column1.df1 column2.df1 column1.df2 column2.df2 df_name
2 30 90 30 90 df1
3 1 2 1 2 df2
第一個concat
DataFrames在一起:
df = (pd.concat([df1, df2], keys=('df1','df2'))
.rename_axis(('df_name','idx'))
.reset_index(level=1, drop=True)
.reset_index())
print (df)
df_name id column1 column2
0 df1 1 30 90
1 df1 2 1 2
2 df2 1 30 90
3 df2 3 1 2
然后獲取所有相同的id
:
a = df1.merge(df2, on='id')['id']
最后是isin
過濾器:
df = df[~df['id'].isin(a)]
print (df)
df_name id column1 column2
1 df1 2 1 2
3 df2 3 1 2
編輯:
類似的解決方案,例如@WB,僅添加了參數id
和suffixes
:
df = (df1.merge(df2,indicator=True,how='outer', on='id', suffixes=('_df1','_df2'))
.query("_merge != 'both'"))
df['_merge'] = df['_merge'].map({'left_only':'df1','right_only':'df2'})
print (df)
id column1_df1 column2_df1 column1_df2 column2_df2 _merge
1 2 1.0 2.0 NaN NaN df1
2 3 NaN NaN 1.0 2.0 df2
如果需要所有行,則也要使用具有相同id
行:
df['_merge'] = df['_merge'].map({'left_only':'df1','right_only':'df2', 'both':'df1+df2'})
print (df)
id column1_df1 column2_df1 column1_df2 column2_df2 _merge
0 1 30.0 90.0 30.0 90.0 df1+df2
1 2 1.0 2.0 NaN NaN df1
2 3 NaN NaN 1.0 2.0 df2
讓我們來merge
df=df1.merge(df2,indicator = True,how='outer').loc[lambda x : x['_merge'].ne('both')]
df['df_name']=df['_merge'].map({'left_only':'df1','right_only':'df2'})
df
Out[328]:
id column1 column2 _merge df_name
1 2 1 2 left_only df1
2 3 1 2 right_only df2
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.