比較兩個不同長度的CSV文件找到匹配值

Question

我想比較兩組數據並找到匹配項。 我有一個文件 (file1.csv) 1400 行，如下所示：

我有另一個文件（file2.csv），大約 46,000 行，如下所示：

accountNumber, user, accountNumber2, address 
456,example@email.com,0,1 Lane
001,example1@email.com,175,2 Lane
002,example2@email.com,789,3 Lane
195,example3@email.com,0,4 Lane
123,example4@email.com,0,5 Lane
689,example5@email.com,0,6 Lane
003,example6@email.com,0,7 Lane
004,example7@email.com,0,8 Lane

我想使用 file1 ID 來匹配 file2 中的電子郵件。 我想要的 output 是一個新文件（file3），例如：

ID,Email
123, example4@email.com
456, example@email.com
789, example2@email.com
145, Not found
165, Not found
175, example1@email.com
185, Not found
195, example3@email.com

這是我嘗試過的：

import pandas as pd

file1_df = pd.read_csv('file1.csv')
file2_df = pd.read_csv('file2.csv')

def search():
    for account_id in file1_df['account_id']:
        print("ID: ",account_id)
        id_loc = file1_df[file1_df['account_id'] == account_id].index.values
        print("id_loc",id_loc)

        try:
            accountNumber = file2_df[file2_df['accountNumber'] == account_id]['user'].values[0]
            print(accountNumber)
            accountNumber_loc = file2_df.loc[file2_df['accountNumber'] == account_id].index.values
            print(account_id, "Found at: ", accountNumber_loc)
            file1_df.loc[id_loc, "Located"] = accountNumber
        except Exception:
            pass
            try:
                accountNumber2 = file2_df[file2_df['accountNumber2'] == account_id]['user'].values[0]
                print(accountNumber2)
                accountNumber2_loc = file2_df.loc[file2_df['accountNumber2'] == account_id].index.values
                print(account_id, "Found at: ", accountNumber2_loc)
                file1_df.loc[id_loc, "Located"] = accountNumber2
            except Exception:
                print("Not Found")
                file1_df.loc[id_loc, "Located"] = "Not found"

search()
file1_df.to_csv('file3.csv')

我不斷收到錯誤：

IndexError: index 0 is out of bounds for axis 0 with size 0

似乎它幾乎適用於小文件，但一旦我嘗試使用真實版本，我就會不斷收到 IndexError。 有沒有更好的方法來找到這些匹配項？

Answer 1

正如@Manakin 已經提到的，這只是您需要在 file1_df 作為左側參考的簡單連接操作。 添加了 rest 代碼以獲取所需格式的數據。

import pandas as pd

file1_df = pd.read_csv('file1.csv')
file2_df = pd.read_csv('file2.csv')

file3_df = pd.merge(file1_df, file2_df, left_on=['ID'], right_on=['accountNumber'], how='left')['id', 'accountNumber'].rename(columns = {'accountNumber': 'Email'})

比較兩個不同長度的CSV文件找到匹配值

問題描述

1 個解決方案

解決方案1
0 2020-12-02 17:16:48

比較兩個不同長度的CSV文件找到匹配值

問題描述

1 個解決方案

解決方案1 0 2020-12-02 17:16:48

解決方案1
0 2020-12-02 17:16:48