繁体   English   中英

Python:合并具有不同列子集的 csv 文件

[英]Python: merge csv files with different column subsets

我有数百个大型 CSV 文件,我想将它们合并为一个。 但是,并非所有 CSV 文件都包含所有列。 因此,我需要根据列名而不是列位置进行合并。

在合并的 CSV 中,对于来自没有该单元格列的行的单元格,值应该为空。

我不能使用 pandas 模块,因为它使我耗尽内存。

有没有可以做到这一点的模块,或者一些简单的代码?

我在代码下方提供生成 2 个 csv 文件。 我想要的是将 tempdf1.csv 和 tempdf2.csv 以一种让我得到 tempdf3.csv 的方式合并。

import pandas as pd

df1=pd.DataFrame([{"Location":"A","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Temperature":"","Weather":"Bad","Wind":"","Latitude":42}])
df2=pd.DataFrame([{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11}])
df1.to_csv("C:/tempdf1.csv")
df2.to_csv("C:/tempdf2.csv")

df3=pd.DataFrame([{"Location":"A","Longitude":"","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Longitude":"","Temperature":"","Weather":"Bad","Wind":"","Latitude":42},{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44,"Latitude":""},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11, "Latitude":""}])
df3.to_csv("C:/tempdf3.csv")

迟到总比没有好:) 看看convtools库,它提供了大量的数据处理原语,是纯 python 并且依赖于代码生成。 > 表处理文档 <

from convtools import conversion as c
from convtools.contrib.tables import Table

# into_* methods can only be called once, because it processes
# a stream and cannot assume it can be read twice
Table.from_csv("tempdf1.csv", header=True).chain(
    Table.from_csv("tempdf2.csv"), header=True
).into_csv("tempdf3.csv")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM