简体   繁体   English

Python - 比较 2 个 csv 文件并删除行

[英]Python - Compare 2 csv files and delete rows

I have 2 csv files with ~10000 lines :我有 2 个大约 10000 行的 csv 文件:

  1. csv of name files from directory来自目录的名称文件的 csv
  2. csv with datas for each file in this directory (point 1)带有此目录中每个文件的数据的 csv(第 1 点)

Example of content for each csv file :每个 csv 文件的内容示例:
csv_1 : csv_1 :

50001200000000016
50001200000000021
50001200000000034
50001200000000048

csv_2: csv_2:

50001200000000016;187
50001200000000021;287
50001200000000034;187
50001200000000048;5

I want to keep in csv_2 only the lines where the first column match and exist in csv_1我只想在 csv_2 中保留第一列匹配并存在于 csv_1 中的行

Ex:前任:
If in csv_1 the line 50001200000000016 dont exist, delete the row in csv_2 who begins with 50001200000000016如果 csv_1 中的 50001200000000016 行不存在,则删除 csv_2 中以 50001200000000016 开头的行

Thx for help谢谢帮助

There are many ways to do this.有很多方法可以做到这一点。 If the csv is simple (ie no tricky quoting or characters, only those two columns), then you could read the first file as set and loop over the files of the second one.如果 csv 很简单(即没有棘手的引用或字符,只有那两列),那么您可以按set读取第一个文件并循环遍历第二个文件。

However, given the specifications you gave (only 10k lines), this shouldn't require any particular optimization and should be easily achievable in memory with pandas :但是,鉴于您提供的规格(只有 10k 行),这不需要任何特殊优化,并且应该可以在使用pandas内存中轻松实现:

import pandas as pd

df1 = pd.read_csv('csv_1.csv', header=None)
df2 = pd.read_csv('csv_2.csv', header=None, sep=';')

df2[df2[0].isin(df1[0])].to_csv('new_file.csv', sep=';', header=None, index=None)

The problem was the file encoding ;问题是文件编码; here is the code that's work on PyCharm / Jupiter Notebook这是在 PyCharm / Jupiter Notebook 上工作的代码

import pandas as pd

df1 = pd.read_csv(r'csv_1.csv', encoding='ANSI', header=None)
print(df1)
df2 = pd.read_csv(r'csv_2.csv', encoding='ANSI', header=None, sep=';')
print(df2)

df2[df2[0].isin(df1[0])].to_csv('new_file.csv', encoding='ANSI', sep=';', header=None, index=None)

Thx all.谢谢所有。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM