简体   繁体   中英

Run Python script over multiple files in a folder

I have multiple csv files in a folder (file1, file2, file3, file4, file5,....)

I only know how to import one file, run the command and output the converted file as shown in below code. I would like to run the command in multiple csv files at once. Can someone please help?

convert.py:

import pandas as pd
import numpy as np

#read file
df = pd.read_csv("file1.csv")

#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

#output file
df.to_csv('file1_converted.csv', index = False)

I started with a code as shown below but it gave only one output(*.csv) from one random csv file. I would like separate output for each file.

import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

#output file
df.to_csv('*.csv', index = False)

indent the code that does the dataframe transformation and include it in the for loop like this:

import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

    #make conversion
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

    #output file
    df.to_csv('updated_{}'.format(file), index = False)

You just need to indent the file writing code so it is performed inside the loop, otherwise it will only write the last file:

import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

    #make conversion
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

    #output file
    df.to_csv('*.csv', index = False)
total_no_file=10

for i in range(total_no_file):
    file_name="file"+str(i+1)
    df = pd.read_csv(file_name)

    #make conversion
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

    file_name="file"+str(i+1)+"_converted"
    df.to_csv('file1_converted.csv', index = False)

So there were two problems with your code. First the indentation was all messed up so the for loop was only reading the different csv files into the same variable. Second you should give a different name to the converted csv files you are writing to disk. Therefore, the following should work for you:

import os
import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    file_name = os.path.splitext(os.path.basename(file))  # Get the file name without extension
    df = pd.read_csv(file)

    #make conversion 
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time 
    taken'].dt.minute / 60

    #output file
    df.to_csv('{}_conv.csv'.format(file_name, index = False)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM