用python pandas比較兩個csv文件

Question

我有兩個 csv 文件都由兩列組成。

第一個有產品ID，第二個有序列號。

我需要查找第一個 csv 中的所有序列號，並在第二個 csv 上找到匹配項。 結果報告將包含匹配的序列號，以及來自每個 csv 的相應產品 ID，在我修改以下代碼的單獨列中，沒有運氣。

你會如何處理這個問題？

import pandas as pd
    A=set(pd.read_csv("c1.csv", index_col=False, header=None)[0]) #reads the csv, takes only the first column and creates a set out of it.
    B=set(pd.read_csv("c2.csv", index_col=False, header=None)[0]) #same here
    print(A-B) #set A - set B gives back everything thats only in A.
    print(B-A) # same here, other way around.

Answer 1

我認為你需要merge ：

A = pd.DataFrame({'product id':   [1455,5452,3775],
                    'serial number':[44,55,66]})

print (A)

B = pd.DataFrame({'product id':   [7000,2000,1000],
                    'serial number':[44,55,77]})

print (B)

print (pd.merge(A, B, on='serial number'))
   product id_x  serial number  product id_y
0          1455             44          7000
1          5452             55          2000

Answer 2

試試這個：

A = pd.read_csv("c1.csv", header=None, usecols=[0], names=['col']).drop_duplicates()
B = pd.read_csv("c2.csv", header=None, usecols=[0], names=['col']).drop_duplicates()
# A - B
pd.merge(A, B, on='col', how='left', indicator=True).query("_merge == 'left_only'")
# B - A
pd.merge(A, B, on='col', how='right', indicator=True).query("_merge == 'right_only'")

Answer 3

您可以將df轉換為集合，在比較數據時忽略索引，然后使用set symmetric_difference

ds1 = set([ tuple(values) for values in df1.values.tolist()])
ds2 = set([ tuple(values) for values in df2.values.tolist()])

ds1.symmetric_difference(ds2)
print df1 ,'\n\n'
print df2,'\n\n'

print pd.DataFrame(list(ds1.difference(ds2))),'\n\n'
print pd.DataFrame(list(ds2.difference(ds1))),'\n\n'

DF1

id  Name  score isEnrolled               Comment
0  111  Jack   2.17       True  He was late to class
1  112  Nick   1.11      False             Graduated
2  113   Zoe   4.12       True                   NaN

DF2

    id  Name  score isEnrolled               Comment
0  111  Jack   2.17       True  He was late to class
1  112  Nick   1.21      False             Graduated
2  113   Zoe   4.12      False           On vacation

產量

     0     1     2      3          4
0  113   Zoe  4.12   True        NaN
1  112  Nick  1.11  False  Graduated 


     0     1     2      3            4
0  113   Zoe  4.12  False  On vacation
1  112  Nick  1.21  False    Graduated

Answer 4

first_one=pd.read_csv(file_path)
//same way for second_one
// if product_id is the first column then its location would be at '0'
len_=len(first_one)
i=0
while(len_!=0)
{
if(first_one[i]==second_one[i])
{
//it is a match do whatever you want with this matched data
i=i-1;
}
len_=len_-1;
}

用python pandas比較兩個csv文件

問題描述

3 個解決方案

解決方案1
5 已采納 2017-02-23 14:32:07

解決方案2
3 2017-02-23 14:34:03

解決方案3
1 2017-02-23 14:34:46

解決方案4
-1 2017-02-23 14:29:13

用python pandas比較兩個csv文件

問題描述

3 個解決方案

解決方案1 5 已采納 2017-02-23 14:32:07

解決方案2 3 2017-02-23 14:34:03

解決方案3 1 2017-02-23 14:34:46

解決方案4 -1 2017-02-23 14:29:13

解決方案1
5 已采納 2017-02-23 14:32:07

解決方案2
3 2017-02-23 14:34:03

解決方案3
1 2017-02-23 14:34:46

解決方案4
-1 2017-02-23 14:29:13