简体繁体 English

Python 3.6：比较两个较大的压缩后的CSV文件并获取差异记录

[英]Python 3.6: Compare two large gzipped csv files & fetch difference records

原文 2017-08-01 06:37:42 9 1 python/ python-3.x/ dataframe/ gunzip

I have 2 gzipped csv files IMFBOP2017_1.csv.gz and IMFBOP2017_2.csv.gz with same columns in both file ie "Location, Indicator, Measure, Unit, Frequency, Date" . 我有2个gzip压缩的csv文件IMFBOP2017_1.csv.gz和IMFBOP2017_2.csv.gz ，在两个文件中均具有相同的列，即"Location, Indicator, Measure, Unit, Frequency, Date" 。

Total rows 60 millions+ 总行数超过6000万

I want to compare both file & display rows of IMFBOP2017_1 that are not present in IMFBOP2017_2 . 我想比较IMFBOP2017_1中不存在的IMFBOP2017_2文件和显示行。

My plan is to import both files to dataframes , add an extra column "compare" to both dataframes and update it by all fields merge like 我的计划是将两个文件都导入到数据框中，在两个数据框中添加一个额外的列“比较”，并通过合并所有字段来更新它

Location|Indicator|Measure|Unit|Frequence|Date and do NOT IN operation. 位置|指示符|度量|单位|频率|日期，请勿运行。

I think this is a costly process, is there any simple solution for this? 我认为这是一个昂贵的过程，是否有任何简单的解决方案？

1 个解决方案

Pandas can read gzipped data files with the ordinary pandas.read_csv() . 熊猫可以使用普通的pandas.read_csv()读取压缩后的数据文件。 How to do a diff between two dataframes is described in Pandas: Diff of two Dataframes . 如何在两个数据帧之间进行区分在《熊猫：两个数据帧的区分》中有所描述。

Python：比较两个大文件 - Python : Compare two large files

如何比较两个CSV文件并获得差异？ - How to compare two CSV files and get the difference?

python：通过两个参考列比较两个大的csv文件，并更新另一列 - python:compare two large csv files by two reference columns and update another column

比较python中两个csv文件中的两列 - Compare two columns in two csv files in python

Python - 在数百个大型 gzip 文件中搜索项目 - Python - Search for items in hundreds of large, gzipped files

如何比较 Python 中的两个大文本文件？ - How to compare two large text files in Python?

比较两个 csv 文件并对 csv 中的差异进行颜色编码 - Compare two csv files and color code the difference in csv

根据关键字段比较两个CSV文件：使用Python查找修改，新记录和删除 - Compare two CSV files based on key field: find modifications, new records and deletions with Python

比较两个文件在python中报告差异 - Compare two files report difference in python

python，比较两个文件并得到差异 - python, compare two files and get difference

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python：比较两个大文件 - Python : Compare two large files 如何比较两个CSV文件并获得差异？ - How to compare two CSV files and get the difference? python：通过两个参考列比较两个大的csv文件，并更新另一列 - python:compare two large csv files by two reference columns and update another column 比较python中两个csv文件中的两列 - Compare two columns in two csv files in python Python - 在数百个大型 gzip 文件中搜索项目 - Python - Search for items in hundreds of large, gzipped files 如何比较 Python 中的两个大文本文件？ - How to compare two large text files in Python? 比较两个 csv 文件并对 csv 中的差异进行颜色编码 - Compare two csv files and color code the difference in csv 根据关键字段比较两个CSV文件：使用Python查找修改，新记录和删除 - Compare two CSV files based on key field: find modifications, new records and deletions with Python 比较两个文件在python中报告差异 - Compare two files report difference in python python，比较两个文件并得到差异 - python, compare two files and get difference

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM