简体   繁体   English

如何根据列名将多个 csv 文件连接成一个文件,而无需在代码中键入每个列标题

[英]How to concatenate multiple csv files into one based on column names without having to type every column header in code

I am relatively new to python (about a weeks experience) and I can't seem to find the answer to my problem.我对 python 比较陌生(大约一周的经验),我似乎无法找到我的问题的答案。

I am trying to merge hundreds of csv files based in my folder Data into a single csv file based on column name.我正在尝试将基于我的文件夹 Data 中的数百个 csv 文件合并为一个基于列名的 csv 文件。

The solutions I have found require me to type out either each file name or column headers which would take days.我找到的解决方案要求我输入每个文件名或列标题,这需要几天时间。

I used this code to create one csv file but the column names move around and therefore the data is not in the same columns over the whole DataFrame:我使用此代码创建了一个 csv 文件,但列名四处移动,因此数据不在整个 DataFrame 的同一列中:

import pandas as pd
import glob
import os
def concatenate(indir=r"C:\\Users\ge\Documents\d\de", 
outfile = r"C:\Users\ge\Documents\d"):
    os.chdir(indir)
    fileList=glob.glob("*.csv")
    dfList = []
    for filename in fileList:
        print(filename)
        df = pd.read_csv(filename, header = None)
        dfList.append(df)
        concatDf = pd.concat(dfList, axis = 0)
    concatDf.to_csv(outfile, index= None)

Is there quick fire method to do this as I have less than a week to run statistics on the dataset.是否有快速启动方法来执行此操作,因为我只有不到一周的时间来对数据集运行统计数据。

Any help would be appreciated.任何帮助,将不胜感激。

I am not sure if I understand your problem correctly, but this is one of the ways that you can merge your files without giving any column names:我不确定我是否正确理解您的问题,但这是您可以在不提供任何列名的情况下合并文件的方法之一:

import pandas as pd
import glob
import os


def concatenate(indir):
    os.chdir(indir)
    fileList=glob.glob("*.csv")
    output_file = pd.concat([pd.read_csv(filename) for filename in fileList])
    output_file.to_csv("_output.csv", index=False)


concatenate(indir= r"C:\\Users\gerardchurch\Documents\Data\dev_en")

Here is one, memory efficient, way to do that.这是一种内存高效的方法来做到这一点。

from pathlib import Path
import csv

indir = Path(r'C:\\Users\gerardchurch\Documents\Data\dev_en')
outfile = Path(r"C:\\Users\gerardchurch\Documents\Data\output.csv")


def find_header_from_all_files(indir):
    columns = set()
    print("Looking for column names in", indir)
    for f in indir.glob('*.csv'):
        with f.open() as sample_csv:
            sample_reader = csv.DictReader(sample_csv)
            try:
                first_row = next(sample_reader)
            except StopIteration:
                print("File {} doesn't contain any data. Double check this".format(f))
                continue
            else:
                columns.update(first_row.keys())
    return columns


columns = find_header_from_all_files(indir)
print("The columns are:", sorted(columns))

with outfile.open('w') as outf:
    wr = csv.DictWriter(outf, fieldnames=list(columns))
    wr.writeheader()
    for inpath in indir.glob('*.csv'):
        print("Parsing", inpath)
        with inpath.open() as infile:
            reader = csv.DictReader(infile)
            wr.writerows(reader)
print("Done, find the output at", outfile)

This should handle case, when one of the input csvs doesn't contain all columns这应该处理情况,当输入 csvs 之一不包含所有columns

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 将2行连接为标题/列名称 - Concatenate 2 Rows to be header/column names 如何根据 header 列将巨大的 csv 拆分为多个 csv - How to Split a huge csv into multiple csv's based on column header 根据列值连接多个 CSV 文件,但多个 csv 文件具有相同的 header 但顺序不同 - Concatenating multiple CSV files based on column values,but the multiple csv files have the same header but vary in order 如何1.将4,550个dbf文件转换为csv文件2.根据名称连接文件3.将所有csv连接成一个大数据csv进行分析? - How to 1. convert 4,550 dbf files to csv files 2. concatenate files based on names 3. concatenate all csv's into one big data csv for analysis? 如何使用 python 使用列作为索引将多个 csv 文件连接到单个 csv 文件中 - How to concatenate multiple csv files into a single csv file using a column as index using python 根据 Pandas 中的列内容连接两个 csv 文件 - Concatenate two csv files based on column content in Pandas 如何在没有列名的情况下将多列合并为一列 - How to melt multiple columns into one without the column names 如何根据公共列合并多个(超过 2 个)csv 文件? - How to merge multiple(more than 2) csv files based on their common column? 根据列值将CSV拆分成多个文件 - Split CSV into multiple files based on column value Import multiple csv files into pandas and concatenate into one DataFrame where 1st column same in all csv and no headers of data just file name - Import multiple csv files into pandas and concatenate into one DataFrame where 1st column same in all csv and no headers of data just file name
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM