[英]How to join two large CSV files?
I have two large .csv files that I would like to join. 我有两个要加入的大型.csv文件。
file1.csv has the following structure: file1.csv具有以下结构:
productcode; *many useless columns* ; startdate; enddate; *some other useless columns*
file2.csv has the following structure: file2.csv具有以下结构:
productcode; *many useless columns different from file1* ; page; startdate; enddate; *some othe useless columns*
I would like to join the two files into a file (let's say, out.csv
) with the same structure as file1.csv but with the "page" column from file2.csv, ie 我想将两个文件连接到一个具有与out.csv
相同的结构但具有file2.csv中的“ page”列的文件(例如out.csv
),即
productcode; *useless columns* ; page; startdate; enddate; *useless columns*
The join conditions are same productcode and overlapping dates, ie: 加入条件是相同的产品代码和重叠的日期,即:
file1.productcode == file2.productcode
and 和
!(file1.endate<file2.startdate or file2.enddate<file1.startdate)
However, I have no idea on how to do that. 但是,我不知道该怎么做。 One possibility could be to export the two CSVs into MySql, process them and then export the result in a final CSV file. 一种可能是将两个CSV导出到MySql,对其进行处理,然后将结果导出到最终CSV文件中。 However, that takes time (and is resource consuming). 但是,这需要时间(并且很耗资源)。
I'm open to any suggestions. 我愿意接受任何建议。
使用pandas加载它们,并使用.join()函数将两者与所需的列引用结合在一起
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.