简体   繁体   English

在Python中的不同文件夹中循环文件

[英]Loop through file in different folder in Python

I have a problem with a loop in Python. 我在Python中有一个循环问题。 My folder looks like this: 我的文件夹看起来像这样:

|folder_initial
       |--data_loop
                   |--example1
                   |--example2
                   |--example3
       |--python_jupyter_notebook

I would like to loop through all files in data_loop, open them, run a simple operation, save them with another name and then do the same with the subsequent file. 我想循环遍历data_loop中的所有文件,打开它们,运行一个简单的操作,用另一个名称保存它们,然后对后续文件执行相同的操作。 I have created the following code: 我创建了以下代码:

import pandas as pd
import numpy as np
import os

def scan_folder(parent):
# iterate over all the files in directory 'parent'
for file_name in os.listdir(parent):
    if file_name.endswith(".csv"):

       print(file_name)
       df = pd.read_csv("RMB_IT.csv", low_memory=False, header=None,     names=['column1','column2','column3','column4']

        df = df[['column2','column4']
        #Substitute ND with missing data
        df = df.replace('ND,1',np.nan)
        df = df.replace('ND,2',np.nan)
        df = df.replace('ND,3',np.nan)
        df = df.replace('ND,4',np.nan)
        df = df.replace('ND,5',np.nan)
        df = df.replace('ND,6',np.nan)

    else:
        current_path = "".join((parent, "/", file_name))
        if os.path.isdir(current_path):
            # if we're checking a sub-directory, recall this method
            scan_folder(current_path)

scan_folder("./data_loop")  # Insert parent direcotry's path

I get the error: 我收到错误:

FileNotFoundError 
FileNotFoundError: File b'example2.csv' does not exist

Moreover, I would like to run the code without the necessity of having the Jupyter notebook in the folder folder_initial but I would like to have something like this: 此外,我想运行代码而无需在文件夹folder_initial中使用Jupyter笔记本,但我希望有这样的东西:

|scripts
        |--Jupiter Notebook
|data
     |---csv files
                  |--example1.csv
                  |--example2.csv

Any idea? 任何想法?

-- Edit: I create something like this on user suggestion - 编辑:我在用户建议上创建了这样的东西

import os                                                                   
import glob                                                                 
os.chdir('C:/Users/bedinan/Documents/python_scripts_v02/data_loop')         
for file in list(glob.glob('*.csv')):                                       
df = pd.read_csv(file, low_memory=False, header=None, names=[

df = df[[

#Substitute ND with missing data
df = df.replace('ND,1',np.nan)
df = df.replace('ND,2',np.nan)
df = df.replace('ND,3',np.nan)
df = df.replace('ND,4',np.nan)
df = df.replace('ND,5',np.nan)
df = df.replace('ND,6',np.nan)   

df.to_pickle(file+"_v02"+".pkl")

f = pd.read_pickle('folder\\data_loop\\RMB_PT.csv_v02.pkl')

But the name of the file that results is not properly composed since it has inside the name the extension -csv 但是结果文件的名称没有正确组成,因为它在名称扩展名-csv中

You can use this answer to iterate over all subfolders: 您可以使用此答案迭代所有子文件夹:

import os
import shutil
import pathlib
import pandas as pd

def scan_folder(root):
    for path, subdirs, files in os.walk(root):
        for name in files:
            if name.endswith('.csv'):
                src = pathlib.PurePath(path, name)
                dst = pathlib.PurePath(path, 'new_' + name)
                shutil.copyfile(src, dst)
                df = pd.read_csv(dst)
                # do something with DF
                df.to_csv()

scan_folder(r'C:\User\Desktop\so\55648849')

Here's a solution which only uses pathlib , which I'm quite a big fan of. 这是一个只使用pathlib的解决方案,我非常喜欢它。 I pulled out your DataFrame operations into their own function, which you can re-name and re-write to actually do what you want it to do. 我将你的DataFrame操作拉出到他们自己的函数中,你可以重命名并重新编写,以实际执行你想要它做的事情。

import pandas as pd
import numpy as np

from pathlib import Path

# rename the function to something more relevant
def df_operation(csv_path):
    df = pd.read_csv(
        csv_path.absolute(),
        low_memory=False,
        header=None,
        names=['column1','column2','column3','column4']
    )
    # do some stuff with the dataframe

def scan_folder(parent):

    p = Path(parent)

    # Probably want a check here to make sure the provided 
    # parent is a directory, not a file
    assert p.is_dir()

    [df_operation(f) for f in p.rglob('*') if f.suffix == '.csv']

print(scan_folder("./example/dir"))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 创建一个BAT文件以循环通过整个文件夹运行python脚本 - Creating a BAT file for loop running python script through whole folder 如何遍历 Python 中的文件夹 - How to loop through a folder in Python 如何遍历文件夹中的每个文件,对文件进行一些操作以及将输出保存到另一个文件夹中的文件Python - How to loop through each file in a folder, do some action to the file and save output to a file in another folder Python Python文件IO和zipfile。 尝试遍历文件夹中的所有文件,然后使用Python遍历各个文件中的文本 - Python file-IO and zipfile. Trying to loop through all the files in a folder and then loop through the texts in respective file using Python Python openpyxl遍历文件夹中的excel文件 - Python openpyxl loop through excel files in folder Python:循环浏览文件夹并从每个文件的第一个选项卡保存数据并在单独的选项卡上保存到新文件 - Python: Loop through a folder and save data from first tab of each file and save to new file on separate tabs 如何使用Python以文件夹中的特定行开头和结尾遍历每个excel文件 - How to loop through each excel file beginning & ending with specific rows in a folder using Python 循环子文件夹使用python仅将特定文件移动到另一个文件夹 - Loop through subfolder move only specific file to another folder using python 遍历JSON文件python - Loop through JSON file python Python:遍历 JSON 文件 - Python: Loop through JSON File
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM