比較並連接兩個數據框中的兩列

Question

我有兩個具有相同列類型的數據框。

第一個 Dataframe (df1)

data = [['BTC', 2], ['ETH', 1], ['ADA', 100]]
df1 = pd.DataFrame(data, columns=['Coin', 'Quantity'])

Coin     Quantity
BTC          2
ETH          1
ADA        100
...        ...

第二 Dataframe (df2)

data = [['BTC', 50000], ['FTM', 50], ['ETH', 1500], ['LRC', 5], ['ADA', 20]]
df2 = pd.DataFrame(data, columns=['code_name', 'selling rate'])

code_name     selling rate
BTC               50000
FTM                  50
ETH                1500
LRC                   5
ADA                  20
...                 ...

預期的 output（應刪除 FTM 和 LRC）

Coin     Quantity     selling rate
BTC          2           50000
ETH          1            1500
ADA        100              20
...        ...             ...

我試過的

df1.merge(df2, how='outer', left_on=['Coin'], right_on=['code_name'])

df = np.where(df1['Coin'] == df2['code_name'])

兩個代碼都沒有給我預期的 output。 我在 StackOverflow 上搜索並找不到任何有用的答案。 如果存在相關問題，任何人都可以給出解決方案或使這個問題重復嗎？

Answer 1

您需要的是內部聯接，而不是外部聯接。 內部聯接僅保留您要聯接在一起的兩個表中共有的記錄。

import pandas as pd

# Make the first data frame
df1 = pd.DataFrame({
    'Coin': ['BTC', 'ETH', 'ADA'],
    'Quantity': [2, 1, 100]
})

# Make the second data frame
df2 = pd.DataFrame({
    'code_name': ['BTC', 'FTM', 'ETH', 'LRC', 'ADA'],
    'selling_rate': [50000, 50, 1500, 5, 20]
})

# Merge the data frames via inner join. This only keeps entries that appear in
# both data frames
full_df = df1.merge(df2, how = 'inner', left_on = 'Coin', right_on = 'code_name')

# Drop the duplicate column
full_df = full_df.drop('code_name', axis = 1)

Answer 2

由於merge()對於大型數據集來說很慢。 只要我有更快的解決方案，我寧願不使用它。 因此，我建議如下：

full_df = df1.copy()
full_df['selling_rate'] = list(
    df2['selling_rate'][df2['code_name'].isin(df1['Coin'].unique())])

注意：如果df1和df2相對於Coin和code_name的順序相同，這將轉向預期的解決方案。 如果不是，則應在上述代碼之前使用sort_values() 。

比較並連接兩個數據框中的兩列

問題描述

2 個解決方案

解決方案1
3 已采納 2021-04-09 07:06:30

解決方案2
1 2021-04-09 07:52:47

比較並連接兩個數據框中的兩列

問題描述

2 個解決方案

解決方案1 3 已采納 2021-04-09 07:06:30

解決方案2 1 2021-04-09 07:52:47

解決方案1
3 已采納 2021-04-09 07:06:30

解決方案2
1 2021-04-09 07:52:47