[英]How to compare two CSV files by column and save the differences in csv file using pandas python
I have two csv files like the following 我有两个如下的csv文件
File1 文件1
x1 10.00 a1
x2 10.00 a2
x3 11.00 a1
x4 10.50 a2
x5 10.00 a3
x6 12.00 a3
File2 文件2
x1
x4
x5
I would like to create a new file that contains 我想创建一个包含以下内容的新文件
x2
x3
x6
using pandas or python 使用熊猫或python
Use Series.isin
with ~
for filtering of values not existing in df1[0]
- in first column with DataFrame.loc
and boolean indexing
: 将Series.isin
与~
一起使用,以过滤df1[0]
不存在的值-在第一列中使用DataFrame.loc
和boolean indexing
:
import pandas as pd
#create DataFrame from first file
df1 = pd.read_csv(file1, sep=";", header=None)
print (df1)
0 1 2
0 x1 10.0 a1
1 x2 10.0 a2
2 x3 11.0 a1
3 x4 10.5 a2
4 x5 10.0 a3
5 x6 12.0 a3
#create DataFrame from second file
df2 = pd.read_csv(file2, header=None, sep='|')
print (df2)
0
0 x1
1 x4
2 x5
s = df1.loc[~df1[0].isin(df2[0]), 0]
print (s)
1 x2
2 x3
5 x6
Name: 0, dtype: object
#write to file
s.to_csv('new.csv', index=False, header=False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.