简体   繁体   中英

Convert multiple xlsm files automatically to multiple csv files by using pandas

I have 300 raw datas (.xlsm) and wanne to extract useful datas and turn them to csv files as input for subsequent neural network, now i try to implement them with 10 datas as example, i have sucessfully extracted the informations what i need, but i dont know how to convert them to csv files with the same name, for single data we can use df.to_csv, but how about for all the datas? with for function?

    import glob
    import pandas as pd
    import numpy as np
    import csv
    import os

    excel_files = glob.glob('../../Versuch/Versuche/RohBeispiel/*.xlsm') 
    directory = '/Beispiel'
    for files in excel_files:
        data = pd.read_excel(files)
        # getting the list of rows and columns you need
        list_of_dfs = pd.DataFrame(data.values[0:600:,12:26], 
                                   columns=data.columns[12:26]).drop(['Sauberkeit', 'Temparatur'], axis=1)
        # converting pandas dataframe columns to numeric: string into float
        cols = ['KonzA', 'KonzB', 'KonzC', 'TempA', 
                'TempB', 'TempC', 'Modul1', 'Modul2', 
                'Modul3', 'Modul4', 'Modul5', 'Modul6']
        list_of_dfs[cols] = list_of_dfs[cols].apply(pd.to_numeric, errors='coerce', axis=1)
        # Filling down from a column through missing data
        for fec in list_of_dfs[cols]:
            list_of_dfs[fec].fillna(method='ffill', inplace=True)       

        csvfilename = files.split('/')[-1].split('.')[0] + '.csv'
        newtempfile = os.path.join(directory,csvfilename)
        print(newtempfile)
        print(list_of_dfs.head(2))

problem is solved.

folder_name = 'Beispiel'
csvfilename = files.split('/')[-1].split('.')[0] + '.csv'  # change into csv files
newtempfile = os.path.join(folder_name, csvfilename)

# Verify if directory exists
if not os.path.exists(folder_name):
    os.makedirs(folder_name)  # If not, create it

print(newtempfile)
list_of_dfs.to_csv(newtempfile, index=False)

The easiest way of doing this is to get the filename from the excel and then use the os.path.join() method to save it to the directory you want.

directory = "C:/Test"
for files in excel_files:
    csvfilename = (os.path.basename(file)[-1]).replace('.xlsm','.csv') 
    newtempfile=os.path.join(directory,csvfilename)

Since you already have the excel df you want to push into the csv file, just add the above code to the loop and change the output csv file to 'newtempfile' and that should do it.

df.to_csv(newtempfile, 'Beispel/data{0}.csv'.format(idx))

Hope this helps. :)

Updated Code:

    cols = ['KonzA', 'KonzB', 'KonzC', 'TempA', 
                    'TempB', 'TempC', 'Modul1', 'Modul2', 
                        'Modul3', 'Modul4', 'Modul5', 'Modul6']
    excel_files = glob.glob('../../Versuch/Versuche/RohBeispiel/*.xlsm')
        for file in excel_files:
            data = pd.read_excel(file, columns = cols) # import only the columns you need to the dataframe
            csvfilename = (os.path.basename(files)[-1]).replace('.xlsm','.csv') 
            newtempfile=os.path.join(directory,csvfilename)

            # converting pandas dataframe columns to numeric: string into float
            data[cols] = data[cols].apply(pd.to_numeric, errors='coerce', axis=1)
            data[cols].fillna(method='ffill', inplace=True)
            data.to_csv(newtempfile).format(idx)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM