[英]Python: merge csv files with different column subsets
我有数百个大型 CSV 文件,我想将它们合并为一个。 但是,并非所有 CSV 文件都包含所有列。 因此,我需要根据列名而不是列位置进行合并。
在合并的 CSV 中,对于来自没有该单元格列的行的单元格,值应该为空。
我不能使用 pandas 模块,因为它使我耗尽内存。
有没有可以做到这一点的模块,或者一些简单的代码?
我在代码下方提供生成 2 个 csv 文件。 我想要的是将 tempdf1.csv 和 tempdf2.csv 以一种让我得到 tempdf3.csv 的方式合并。
import pandas as pd
df1=pd.DataFrame([{"Location":"A","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Temperature":"","Weather":"Bad","Wind":"","Latitude":42}])
df2=pd.DataFrame([{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11}])
df1.to_csv("C:/tempdf1.csv")
df2.to_csv("C:/tempdf2.csv")
df3=pd.DataFrame([{"Location":"A","Longitude":"","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Longitude":"","Temperature":"","Weather":"Bad","Wind":"","Latitude":42},{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44,"Latitude":""},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11, "Latitude":""}])
df3.to_csv("C:/tempdf3.csv")
迟到总比没有好:) 看看convtools库,它提供了大量的数据处理原语,是纯 python 并且依赖于代码生成。 > 表处理文档 <
from convtools import conversion as c
from convtools.contrib.tables import Table
# into_* methods can only be called once, because it processes
# a stream and cannot assume it can be read twice
Table.from_csv("tempdf1.csv", header=True).chain(
Table.from_csv("tempdf2.csv"), header=True
).into_csv("tempdf3.csv")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.