简体   繁体   English

从多个文件夹中读取多个 txt 文件

[英]Reading multiple txt files from multiple folders

I have 20 folders, each containing 50 txt files, I need to read all of them in order to compare the word counts of each folder.我有 20 个文件夹,每个文件夹包含 50 个 txt 文件,我需要阅读所有文件夹以比较每个文件夹的字数。 I know how to read multiple files in one folder, but it is slow, is there a more efficient way instead of reading the folder one by one like below?我知道如何读取一个文件夹中的多个文件,但是速度很慢,有没有更有效的方法,而不是像下面这样一个一个地读取文件夹?

import re
import pandas as pd
import seaborn as sns
from matplotlib import pyplot as plt

import os
import glob

1. folder1

folder_path = '/home/runner/Final-Project/folder1'

for filename in glob.glob(os.path.join(folder_path, '*.txt')):
  with open(filename, 'r') as f:
    text = f.read()
    print (filename)
    print (len(text))

2. folder2

folder_path = '/home/runner/Final-Project/folder2'

for filename in glob.glob(os.path.join(folder_path, '*.txt')):
  with open(filename, 'r') as f:
    text = f.read()
    print (filename)
    print (len(text))

You can do something similar using glob like you have, but with the directory names.您可以像使用glob一样执行类似的操作,但使用目录名称。

folder_path = '/home/runner/Final-Project'

for filename in glob.glob(os.path.join(folder_path,'*','*.txt')):
    # process your files

The first '*' in the os.path.join() represents directories of any name. os.path.join()中的第一个'*'代表任何名称的目录。 So calling glob.glob() like this will go through and find any text file in any directory within folder_path因此,像这样调用glob.glob()将通过 go 并在folder_path内的任何目录中找到任何文本文件

Below function will return list of files in all the directories and sub-directories without using glob. function 下面将返回所有目录和子目录中的文件列表,而不使用 glob。 Read from the list of files and open to read.从文件列表中读取并打开读取。

def list_of_files(dirName):
    files_list = os.listdir(dirName)
    all_files = list()
    for entry in files_list:
        # Create full path
        full_path = os.path.join(dirName, entry)
        if os.path.isdir(full_path):
            all_files = all_files + list_of_files(full_path)
        else:
            all_files.append(full_path)

    return all_files

print(list_of_files(<Dir Path>))  # <Dir Path>  ==> your directory path

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM