简体   繁体   中英

Python - for all csv files in folder make new csv file with header names and the file they came from

I have a code where I am writing to five csv files, and after all of the CSV files are created, I would like to run a function to put all of the headers into a csv or xlsx file where each row represents a header in a file.

So in a folder called "Example" there are 5 csv files, called "1.csv", "2.csv"... "5.csv"; for the code I would like to have, a new file would be created called "Headers of files in Example", where the first column is the name of the csv file the header came from, and the second column contains the headers. Ultimately looking like this: contents of Headers of files in example , where the headers of 1.csv are a,b,c and so on.

My python coding is fairly basic at this point, but I definitely think what I would like to do is possible. Any suggestions to help would be greatly appreciated!

After some more digging I was able to find some code that did what I wanted it to, after some slight modifications:

import csv
import glob
import pandas as pd


def headers():
    path = r'path to folder containing csv files/'
    all_files = glob.glob(path + "*.csv")
    files = all_files
    myheaders = ['filename', 'header']
    with open("Headers of foldername.csv", "w", newline='') as fw:
        cw = csv.writer(fw, delimiter=",")
        for filename in files:
            with open(filename, 'r') as f:
                cr = csv.reader(f)
                # get title
                for column_name in (x.strip() for x in next(cr)):
                    cw.writerow([filename, column_name])
        file = pd.read_csv("Headers of foldername.csv")
        file.to_csv("Headers of foldername.csv", header=myheaders, index=False)

Given you have the DataFrames in the memory, you just need to create a new DataFrame, I like to use dictionaries of lists to create it, then for each file/dataframe you extract the columns and upload it to the mock DataFrame. Later you can save the new DataFrame to a file.

summary_df = {
    'file_name': list(),
    'headers': list()}

for file, filename in zip(list_of_files, list_of_names):
    aux_headers = file.columns.to_list()
    summary_df['headers'] += aux_headers
    summary_df['file_name'] += [filename] * len(aux_headers)

summary_df = pd.DataFrame(summary_df)

I hope this piece of code helps. Essentially what it does is to iterate over all files you want, their names in file_names then read them using pandas. Once the csv is loaded you extract the headers with df.columns and store it in a list which is then saves as a new csv by pandas.

import pandas as pd
header_names = []
file_names = ['1.csv', '2.csv']
for file_name in file_names:
    df = pd.read_csv(file_name)
    header_names.extend(list(df.columns))
new_df = pd.DataFrame(l)
new_df.to_csv("headers.csv")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM