简体   繁体   中英

Python: merge csv files with different column subsets

I have hundreds of large CSV files that I would like to merge into one. However not all CSV files contain all columns. I therefore need to merge based on column name, not column position.

In the merged CSV, values should be empty for a cell coming from a line which did not have the column of that cell.

I cannot use the pandas module, because it makes me run out of memory.

Is there a module that can do that, or some easy code?

I am providing below the code to generate 2 csv files. What I would like is to merge tempdf1.csv and tempdf2.csv in a way that gets me tempdf3.csv.

import pandas as pd

df1=pd.DataFrame([{"Location":"A","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Temperature":"","Weather":"Bad","Wind":"","Latitude":42}])
df2=pd.DataFrame([{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11}])
df1.to_csv("C:/tempdf1.csv")
df2.to_csv("C:/tempdf2.csv")

df3=pd.DataFrame([{"Location":"A","Longitude":"","Temperature":20,"Weather":"Fair", "Wind":"", "Latitude":44},{"Location":"B","Longitude":"","Temperature":"","Weather":"Bad","Wind":"","Latitude":42},{"Location":"C","Temperature":14,"Weather":"","Longitude":12, "Wind":44,"Latitude":""},{"Location":"D","Temperature":"","Weather":"","Wind":0,"Longitude":11, "Latitude":""}])
df3.to_csv("C:/tempdf3.csv")

Late is still better than never :) Have a look at convtools library, which provides lots of data processing primitives, is pure python and relies on code generation. > Table processing docs <

from convtools import conversion as c
from convtools.contrib.tables import Table

# into_* methods can only be called once, because it processes
# a stream and cannot assume it can be read twice
Table.from_csv("tempdf1.csv", header=True).chain(
    Table.from_csv("tempdf2.csv"), header=True
).into_csv("tempdf3.csv")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM