简体   繁体   English

Python,读取子目录的 Zip 文件。 Windows 对象不可迭代

[英]Python, Reading Zip files of a subdirectory. Windows object is not iterable

I am trying to loop through my subdirectories to read in my zip files.我正在尝试遍历我的子目录以读取我的 zip 文件。 I am getting error TypeError: 'WindowsPath' object is not iterable我收到错误TypeError: 'WindowsPath' object is not iterable

What i am trying:我在尝试什么:

path = Path("O:/Stack/Over/Flow/")
for p in path.rglob("*"):
     print(p.name)
     zip_files = (str(x) for x in Path(p.name).glob("*.zip"))
     df = process_files(p)   #function

What does work - when I go to the folder directly with my path:什么有效 - 当我直接使用我的路径转到文件夹时:

path = r'O:/Stack/Over/Flow/2022 - 10/'
zip_files = (str(x) for x in Path(path).glob("*.zip"))
df = process_files(zip_files)

any help would be appreciated.任何帮助,将不胜感激。

Directory structure is like:目录结构如下:

 //Stack/Over/Flow/2022 - 10/Original.zip 
 //Stack/Over/Flow/2022 - 09/Next file.zip

function i call:我调用的函数:

from io import BytesIO
from pathlib import Path
from zipfile import ZipFile
import os
import pandas as pd


def process_files(files: list) -> pd.DataFrame:
    file_mapping = {}
    for file in files:
        #data_mapping = pd.read_excel(BytesIO(ZipFile(file).read(Path(file).stem)), sheet_name=None)
        
        archive = ZipFile(file)

        # find file names in the archive which end in `.xls`, `.xlsx`, `.xlsb`, ...
        files_in_archive = archive.namelist()
        excel_files_in_archive = [
            f for f in files_in_archive if Path(f).suffix[:4] == ".xls"
        ]
        # ensure we only have one file (otherwise, loop or choose one somehow)
        assert len(excel_files_in_archive) == 1

        # read in data
        data_mapping = pd.read_excel(
            BytesIO(archive.read(excel_files_in_archive[0])),
            sheet_name=None,
        )

        row_counts = []
        for sheet in list(data_mapping.keys()):
            row_counts.append(len(data_mapping.get(sheet)))

        file_mapping.update({file: sum(row_counts)})

    frame = pd.DataFrame([file_mapping]).transpose().reset_index()
    frame.columns = ["file_name", "row_counts"]

    return frame

I suspect the error is from line zip_files .我怀疑错误来自zip_files行。 You should use the joinpath method of the Path object to join the p object with the subdirectory containing the zip files, and then use the glob method on that new Path object to get a list of all the zip files.您应该使用Path对象的joinpath方法将p对象与包含zip文件的子目录连接起来,然后对该新的Path对象使用glob方法来获取所有 zip 文件的列表。

path = Path("O:/Stack/Over/Flow/")
for p in path.rglob("*"):
     print(p.name)
     zip_files = (str(x) for x in p.joinpath("*.zip").glob("*.zip"))
     df = process_files(zip_files)

A possible solution using os.walk() :使用os.walk()的可能解决方案:

for root, dirs, files in os.walk(main_path):
    for file in files:
        if file.endswith('.zip'):
            df = process_files(os.path.join(root, file))   #function

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM