I have multiple csv files in a folder (file1, file2, file3, file4, file5,....)
I only know how to import one file, run the command and output the converted file as shown in below code. I would like to run the command in multiple csv files at once. Can someone please help?
convert.py:
import pandas as pd
import numpy as np
#read file
df = pd.read_csv("file1.csv")
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60
#output file
df.to_csv('file1_converted.csv', index = False)
I started with a code as shown below but it gave only one output(*.csv) from one random csv file. I would like separate output for each file.
import glob
import pandas as pd
import numpy as np
files = glob.glob('folder/*.csv')
for file in files:
df = pd.read_csv(file)
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60
#output file
df.to_csv('*.csv', index = False)
indent the code that does the dataframe transformation and include it in the for loop like this:
import glob
import pandas as pd
import numpy as np
files = glob.glob('folder/*.csv')
for file in files:
df = pd.read_csv(file)
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60
#output file
df.to_csv('updated_{}'.format(file), index = False)
You just need to indent the file writing code so it is performed inside the loop, otherwise it will only write the last file:
import glob
import pandas as pd
import numpy as np
files = glob.glob('folder/*.csv')
for file in files:
df = pd.read_csv(file)
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60
#output file
df.to_csv('*.csv', index = False)
total_no_file=10
for i in range(total_no_file):
file_name="file"+str(i+1)
df = pd.read_csv(file_name)
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60
file_name="file"+str(i+1)+"_converted"
df.to_csv('file1_converted.csv', index = False)
So there were two problems with your code. First the indentation was all messed up so the for loop was only reading the different csv files into the same variable. Second you should give a different name to the converted csv files you are writing to disk. Therefore, the following should work for you:
import os
import glob
import pandas as pd
import numpy as np
files = glob.glob('folder/*.csv')
for file in files:
file_name = os.path.splitext(os.path.basename(file)) # Get the file name without extension
df = pd.read_csv(file)
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time
taken'].dt.minute / 60
#output file
df.to_csv('{}_conv.csv'.format(file_name, index = False)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.