簡體   English   中英

對文件夾中的多個文件運行Python腳本

[英]Run Python script over multiple files in a folder

我在一個文件夾中有多個csv文件(file1,file2,file3,file4,file5,...)

我只知道如何導入一個文件,運行命令並輸出轉換后的文件,如下面的代碼所示。 我想一次在多個csv文件中運行命令。 有人可以幫忙嗎?

convert.py:

import pandas as pd
import numpy as np

#read file
df = pd.read_csv("file1.csv")

#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

#output file
df.to_csv('file1_converted.csv', index = False)

我從如下所示的代碼開始,但是它只從一個隨機的csv文件提供了一個輸出(* .csv)。 我想要每個文件的單獨輸出。

import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

#make conversion
df['Time taken'] = pd.to_datetime(df['Time taken'])
df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

#output file
df.to_csv('*.csv', index = False)

縮進執行數據幀轉換的代碼,並將其包含在for循環中,如下所示:

import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

    #make conversion
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

    #output file
    df.to_csv('updated_{}'.format(file), index = False)

您只需要縮進文件寫入代碼即可在循環內執行,否則它將只寫入最后一個文件:

import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    df = pd.read_csv(file)

    #make conversion
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

    #output file
    df.to_csv('*.csv', index = False)
total_no_file=10

for i in range(total_no_file):
    file_name="file"+str(i+1)
    df = pd.read_csv(file_name)

    #make conversion
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time taken'].dt.minute / 60

    file_name="file"+str(i+1)+"_converted"
    df.to_csv('file1_converted.csv', index = False)

因此,您的代碼有兩個問題。 首先,所有的縮進都搞砸了,所以for循環只將不同的csv文件讀入同一變量。 其次,您應該給要寫入磁盤的轉換后的csv文件取一個不同的名稱。 因此,以下應為您工作:

import os
import glob
import pandas as pd
import numpy as np

files = glob.glob('folder/*.csv')
for file in files:
    file_name = os.path.splitext(os.path.basename(file))  # Get the file name without extension
    df = pd.read_csv(file)

    #make conversion 
    df['Time taken'] = pd.to_datetime(df['Time taken'])
    df['Time taken'] = df['Time taken'].dt.hour + df['Time 
    taken'].dt.minute / 60

    #output file
    df.to_csv('{}_conv.csv'.format(file_name, index = False)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM