简体   繁体   English

如何合并两个大的CSV文件?

[英]How to join two large CSV files?

I have two large .csv files that I would like to join. 我有两个要加入的大型.csv文件。

file1.csv has the following structure: file1.csv具有以下结构:

productcode; *many useless columns* ; startdate; enddate; *some other useless columns*

file2.csv has the following structure: file2.csv具有以下结构:

productcode; *many useless columns different from file1* ; page; startdate; enddate; *some othe useless columns*

I would like to join the two files into a file (let's say, out.csv ) with the same structure as file1.csv but with the "page" column from file2.csv, ie 我想将两个文件连接到一个具有与out.csv相同的结构具有file2.csv中的“ page”列的文件(例如out.csv ),即

productcode; *useless columns* ; page; startdate; enddate; *useless columns*

The join conditions are same productcode and overlapping dates, ie: 加入条件是相同的产品代码和重叠的日期,即:

file1.productcode == file2.productcode

and

!(file1.endate<file2.startdate or file2.enddate<file1.startdate)

However, I have no idea on how to do that. 但是,我不知道该怎么做。 One possibility could be to export the two CSVs into MySql, process them and then export the result in a final CSV file. 一种可能是将两个CSV导出到MySql,对其进行处理,然后将结果导出到最终CSV文件中。 However, that takes time (and is resource consuming). 但是,这需要时间(并且很耗资源)。

I'm open to any suggestions. 我愿意接受任何建议。

使用pandas加载它们,并使用.join()函数将两者与所需的列引用结合在一起

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM