根據列值加入 pandas 個數據幀

Question

我對 pandas 數據框很陌生，在連接兩個表時遇到了一些麻煩。

第一個 df 只有 3 列：

DF1 ：

item_id    position    document_id
336        1           10
337        2           10
338        3           10
1001       1           11
1002       2           11
1003       3           11
38         10          146

第二個具有完全相同的兩列（以及許多其他列）：

DF2 ：

item_id    document_id    col1    col2   col3    ...
337        10             ...     ...    ...
1002       11             ...     ...    ...
1003       11             ...     ...    ...

我需要的是執行一個操作，在 SQL 中，它看起來如下所示：

DF1 join DF2 on 
DF1.document_id = DF2.document_id
and
DF1.item_id = DF2.item_id

因此，我希望看到 DF2，並輔以“位置”列：

item_id    document_id    position    col1   col2   col3   ...

使用 pandas 執行此操作的好方法是什么？

Answer 1

我認為您需要與默認inner聯接merge ，但是在兩列中都沒有重復的值組合：

print (df2)
   item_id  document_id col1  col2  col3
0      337           10    s     4     7
1     1002           11    d     5     8
2     1003           11    f     7     0

df = pd.merge(df1, df2, on=['document_id','item_id'])
print (df)
   item_id  position  document_id col1  col2  col3
0      337         2           10    s     4     7
1     1002         2           11    d     5     8
2     1003         3           11    f     7     0

但如果需要position列在位置3 ：

df = pd.merge(df2, df1, on=['document_id','item_id'])
cols = df.columns.tolist()
df = df[cols[:2] + cols[-1:] + cols[2:-1]]
print (df)
   item_id  document_id  position col1  col2  col3
0      337           10         2    s     4     7
1     1002           11         2    d     5     8
2     1003           11         3    f     7     0

Answer 2

如果您像在 OP 中那樣合並所有公共列，您甚至不需要傳遞on= ，只需調用merge()即可完成這項工作。

merged_df = df1.merge(df2)

原因是在幕后，如果沒有傳遞on= ，則會在列上調用pd.Index.intersection以確定公共列並合並所有列。

在公共列上合並的一個特殊之處在於，無論 dataframe 在右邊還是左邊，過濾的行都是相同的，因為它們是通過在公共列上查找匹配行來選擇的。 唯一的區別是列的位置； 右側 dataframe 中不在左側 dataframe 中的列將添加到左側列 dataframe 的右側。因此，除非列的順序很重要（使用列選擇或reindex()可以很容易地修復） , 哪個 dataframe 在右邊哪個在左邊並不重要。 換句話說，

df12 = df1.merge(df2, on=['document_id','item_id']).sort_index(axis=1)
df21 = df2.merge(df1, on=['document_id','item_id']).sort_index(axis=1)

# df12 and df21 are the same.
df12.equals(df21)     # True

如果要合並的列沒有相同的名稱並且您必須傳遞left_on=和right_on= （請參閱此答案中的#1），則情況並非如此。

根據列值加入 pandas 個數據幀

問題描述

2 個解決方案

解決方案1
15 已采納 2017-06-27 13:33:31

解決方案2
0 2023-02-03 01:37:27

根據列值加入 pandas 個數據幀

問題描述

2 個解決方案

解決方案1 15 已采納 2017-06-27 13:33:31

解決方案2 0 2023-02-03 01:37:27

解決方案1
15 已采納 2017-06-27 13:33:31

解決方案2
0 2023-02-03 01:37:27