[英]Run Python script over multiple files in a folder
我在一个文件夹中有多个csv文件(file1,file2,file3,file4,file5,...)
我只知道如何导入一个文件,运行命令并输出转换后的文件,如下面的代码所示。 我想一次在多个csv文件中运行命令。 有人可以帮忙吗?
convert.py:
import pandas as pd
import numpy as np
#read file
df = pd.read_csv("file1.csv")
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60
#output file
df.to_csv('file1_converted.csv', index = False)
我从如下所示的代码开始,但是它只从一个随机的csv文件提供了一个输出(* .csv)。 我想要每个文件的单独输出。
import glob
import pandas as pd
import numpy as np
files = glob.glob('folder/*.csv')
for file in files:
df = pd.read_csv(file)
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60
#output file
df.to_csv('*.csv', index = False)
缩进执行数据帧转换的代码,并将其包含在for循环中,如下所示:
import glob
import pandas as pd
import numpy as np
files = glob.glob('folder/*.csv')
for file in files:
df = pd.read_csv(file)
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60
#output file
df.to_csv('updated_{}'.format(file), index = False)
您只需要缩进文件写入代码即可在循环内执行,否则它将只写入最后一个文件:
import glob
import pandas as pd
import numpy as np
files = glob.glob('folder/*.csv')
for file in files:
df = pd.read_csv(file)
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60
#output file
df.to_csv('*.csv', index = False)
total_no_file=10
for i in range(total_no_file):
file_name="file"+str(i+1)
df = pd.read_csv(file_name)
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60
file_name="file"+str(i+1)+"_converted"
df.to_csv('file1_converted.csv', index = False)
因此,您的代码有两个问题。 首先,所有的缩进都搞砸了,所以for循环只将不同的csv文件读入同一变量。 其次,您应该给要写入磁盘的转换后的csv文件取一个不同的名称。 因此,以下应为您工作:
import os
import glob
import pandas as pd
import numpy as np
files = glob.glob('folder/*.csv')
for file in files:
file_name = os.path.splitext(os.path.basename(file)) # Get the file name without extension
df = pd.read_csv(file)
#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time
taken'].dt.minute / 60
#output file
df.to_csv('{}_conv.csv'.format(file_name, index = False)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.