简体   繁体   English

如何遍历文件夹以为文件夹中的每个项目运行脚本?

[英]How to loop through a file folder to run script for each item in folder?

I have a script that takes a sample from an excel file and spits back out that sample as a csv.我有一个脚本,它从一个 excel 文件中获取一个样本,然后将该样本作为 csv 输出。 How would one go about looping through a file folder with multiple excel files to avoid the task of changing the file for every run of the script?如何遍历包含多个 excel 文件的文件夹以避免每次运行脚本时更改文件的任务? I believe I can use glob, but that appears to merely merge all the excel files together.我相信我可以使用 glob,但这似乎只是将所有 excel 文件合并在一起。

import pandas as pd
import glob

root_dir = r"C:\Users\bryanmccormack\Desktop\Test_Folder\*.xlsx"
excel_files = glob.glob(root_dir, recursive=True)

for xls in excel_files:
    df_excel = pd.read_excel(xls)
    df_excel = df_excel.loc[(df_excel['Track Item']=='Y')]

def sample_per(df_excel):
    if len(df_excel) <= 10000:
        return df_excel.sample(frac=0.05)
    elif len(df_excel) >= 15000:
        return df_excel.sample(frac=0.03)
    else:
        return df_excel.sample(frac=0.01)

final = sample_per(xls)

df_excel.loc[df_excel['Retailer Item ID'].isin(final['Retailer Item ID']), 'Track Item'] = 'Audit'

df_excel.to_csv('Testicle.csv',index=False)

This returns a list of all files in a directory that you can iterate on:这将返回您可以迭代的目录中所有文件的列表:

from os import walk
from os.path import join

def retrieve_file_paths(dirName):       #Declare the function to return all file paths of the particular directory
    filepaths = []                      #setup file paths variable
    for root, directories, files in walk(dirName):   #Read all directory, subdirectories and file lists
        for filename in files:
            filepath = join(root, filename)     #Create the full filepath by using os module.
            filepaths.append(filepath)

    return filepaths      #return all paths

at the end it should look something on this line:最后它应该看起来像这条线:

import pandas as pd
from os import walk
from os.path import join

dirName = "/your/dir"

def sample_per(df2):
    if len(df2) <= 10000:
        return df2.sample(frac=0.05)
    elif len(df2) >= 15000:
        return df2.sample(frac=0.03)
    else:
        return df2.sample(frac=0.01)


def retrieve_file_paths(dirName):       #Declare the function to return all file paths of the particular directory
    filepaths = []                      #setup file paths variable
    for root, directories, files in walk(dirName):   #Read all directory, subdirectories and file lists
        for filename in files:
            filepath = join(root, filename)     #Create the full filepath by using os module.
            filepaths.append(filepath)

    return filepaths      #return all paths

def main():
    global dirName
    for filepath in retrieve_file_paths(dirName):
        df = pd.read_excel(r+filepath)
        df2 = df.loc[(df['Track Item']=='Y')]
        final = sample_per(df2)
        df.loc[df['Retailer Item ID'].isin(final['Retailer Item ID']), 'Track Item'] = 'Audit'
        df.to_csv('Test.csv',index=False)

if __name__ == '__main__':
    main()

You were on the right track but using pd.concat() was 'responsible for merging your excel files.您在正确的轨道上,但使用 pd.concat() 是“负责合并您的 excel 文件”。 This snippet should help you:这个片段应该可以帮助你:

import pandas as pd
import glob

# use regex style to get all files with xlsx extension
root_dir = r"excel/*.xlsx"
# this call of glob only gives xlsx files in the root_dir
excel_files = glob.glob(root_dir)

# iterate over the files
for xls in excel_files:
    # read
    df_excel = pd.read_excel(xls)
    # manipulate as you wish here
    df_new = df_excel.sample(frac=0.1)
    # store
    df_new.to_csv(xls.replace("xlsx", "csv"))

Note you can also pass recursive=True in the glob call which gives you (from python 3+ I believe) all excel files from the subdirectories.请注意,您还可以在 glob 调用中传递recursive=True ,它为您提供(我相信来自 python 3+)子目录中的所有 excel 文件。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何遍历文件夹中的每个文件,对文件进行一些操作以及将输出保存到另一个文件夹中的文件Python - How to loop through each file in a folder, do some action to the file and save output to a file in another folder Python 如何在驱动器中的每个文件夹上运行脚本? - How to run a script on each folder in a drive? Arcpy脚本循环 - 如何遍历文件夹中的表并在每个表上执行arcpy join功能? - Arcpy Script Looping - How do I loop through tables in a folder and perform arcpy join function on each table? 编写一个Python脚本,该脚本会爬过一个文件夹并列出每个文件和大小 - Write a python script that crawls through a folder and lists each file and size 创建一个BAT文件以循环通过整个文件夹运行python脚本 - Creating a BAT file for loop running python script through whole folder 如何使用Python以文件夹中的特定行开头和结尾遍历每个excel文件 - How to loop through each excel file beginning & ending with specific rows in a folder using Python 我们如何遍历文件夹中的文本文件,复制每个文件中的前 2 行,然后转置结果? - How can we loop through text files in a folder, copy the first 2 rows in each file, and transpose the results? 我如何遍历文件夹中的每个 csv 文件,选择每个文件的某些列并将其与现有的 .csv 文件合并 - How can i loop through each of the csv file in a folder,pick certain columns of each file and merge it with an already existing .csv file 如何循环遍历子文件夹并读取每个子文件夹中的图像? - How to loop through a folder of subfolders and read the images inside each subfolder? Google Colab,遍历谷歌驱动器文件夹,将文件夹中的每个 CSV 文件读入数据帧,然后 append dataframe - Google Colab, loop through google drive folder, read each CSV file in folder into a datframe, then append dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM