简体   繁体   English

如何按列比较两个CSV文件并使用Pandas Python将CSV文件中的差异保存

[英]How to compare two CSV files by column and save the differences in csv file using pandas python

I have two csv files like the following 我有两个如下的csv文件

File1 文件1

x1 10.00 a1
x2 10.00 a2
x3 11.00 a1
x4 10.50 a2
x5 10.00 a3
x6 12.00 a3

File2 文件2

x1
x4
x5

I would like to create a new file that contains 我想创建一个包含以下内容的新文件

x2
x3
x6

using pandas or python 使用熊猫或python

Use Series.isin with ~ for filtering of values not existing in df1[0] - in first column with DataFrame.loc and boolean indexing : Series.isin~一起使用,以过滤df1[0]不存在的值-在第一列中使用DataFrame.locboolean indexing

import pandas as pd

#create DataFrame from first file
df1 = pd.read_csv(file1, sep=";", header=None)
print (df1)
    0     1   2
0  x1  10.0  a1
1  x2  10.0  a2
2  x3  11.0  a1
3  x4  10.5  a2
4  x5  10.0  a3
5  x6  12.0  a3

#create DataFrame from second file
df2 = pd.read_csv(file2, header=None, sep='|')
print (df2)
    0
0  x1
1  x4
2  x5

s = df1.loc[~df1[0].isin(df2[0]), 0]
print (s)
1    x2
2    x3
5    x6
Name: 0, dtype: object

#write to file
s.to_csv('new.csv', index=False, header=False)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM