如何按列比较两个CSV文件并使用Pandas Python将CSV文件中的差异保存

Question

I have two csv files like the following 我有两个如下的csv文件

File1 文件1

x1 10.00 a1
x2 10.00 a2
x3 11.00 a1
x4 10.50 a2
x5 10.00 a3
x6 12.00 a3

File2 文件2

x1
x4
x5

I would like to create a new file that contains 我想创建一个包含以下内容的新文件

x2
x3
x6

using pandas or python 使用熊猫或python

Answer 1

Use Series.isin with ~ for filtering of values not existing in df1[0] - in first column with DataFrame.loc and boolean indexing : 将Series.isin与~一起使用，以过滤df1[0]不存在的值-在第一列中使用DataFrame.loc和boolean indexing ：

import pandas as pd

#create DataFrame from first file
df1 = pd.read_csv(file1, sep=";", header=None)
print (df1)
    0     1   2
0  x1  10.0  a1
1  x2  10.0  a2
2  x3  11.0  a1
3  x4  10.5  a2
4  x5  10.0  a3
5  x6  12.0  a3

#create DataFrame from second file
df2 = pd.read_csv(file2, header=None, sep='|')
print (df2)
    0
0  x1
1  x4
2  x5

s = df1.loc[~df1[0].isin(df2[0]), 0]
print (s)
1    x2
2    x3
5    x6
Name: 0, dtype: object

#write to file
s.to_csv('new.csv', index=False, header=False)

如何按列比较两个CSV文件并使用Pandas Python将CSV文件中的差异保存

问题描述

1 个解决方案

解决方案1
1 2019-03-30 23:14:25

如何按列比较两个CSV文件并使用Pandas Python将CSV文件中的差异保存

问题描述

1 个解决方案

解决方案1 1 2019-03-30 23:14:25

解决方案1
1 2019-03-30 23:14:25