[英]Python, Reading Zip files of a subdirectory. Windows object is not iterable
I am trying to loop through my subdirectories to read in my zip files.我正在尝试遍历我的子目录以读取我的 zip 文件。 I am getting error
TypeError: 'WindowsPath' object is not iterable
我收到错误
TypeError: 'WindowsPath' object is not iterable
What i am trying:我在尝试什么:
path = Path("O:/Stack/Over/Flow/")
for p in path.rglob("*"):
print(p.name)
zip_files = (str(x) for x in Path(p.name).glob("*.zip"))
df = process_files(p) #function
What does work - when I go to the folder directly with my path:什么有效 - 当我直接使用我的路径转到文件夹时:
path = r'O:/Stack/Over/Flow/2022 - 10/'
zip_files = (str(x) for x in Path(path).glob("*.zip"))
df = process_files(zip_files)
any help would be appreciated.任何帮助,将不胜感激。
Directory structure is like:目录结构如下:
//Stack/Over/Flow/2022 - 10/Original.zip
//Stack/Over/Flow/2022 - 09/Next file.zip
function i call:我调用的函数:
from io import BytesIO
from pathlib import Path
from zipfile import ZipFile
import os
import pandas as pd
def process_files(files: list) -> pd.DataFrame:
file_mapping = {}
for file in files:
#data_mapping = pd.read_excel(BytesIO(ZipFile(file).read(Path(file).stem)), sheet_name=None)
archive = ZipFile(file)
# find file names in the archive which end in `.xls`, `.xlsx`, `.xlsb`, ...
files_in_archive = archive.namelist()
excel_files_in_archive = [
f for f in files_in_archive if Path(f).suffix[:4] == ".xls"
]
# ensure we only have one file (otherwise, loop or choose one somehow)
assert len(excel_files_in_archive) == 1
# read in data
data_mapping = pd.read_excel(
BytesIO(archive.read(excel_files_in_archive[0])),
sheet_name=None,
)
row_counts = []
for sheet in list(data_mapping.keys()):
row_counts.append(len(data_mapping.get(sheet)))
file_mapping.update({file: sum(row_counts)})
frame = pd.DataFrame([file_mapping]).transpose().reset_index()
frame.columns = ["file_name", "row_counts"]
return frame
I suspect the error is from line zip_files
.我怀疑错误来自
zip_files
行。 You should use the joinpath
method of the Path
object to join the p
object with the subdirectory containing the zip
files, and then use the glob
method on that new Path
object to get a list of all the zip files.您应该使用
Path
对象的joinpath
方法将p
对象与包含zip
文件的子目录连接起来,然后对该新的Path
对象使用glob
方法来获取所有 zip 文件的列表。
path = Path("O:/Stack/Over/Flow/")
for p in path.rglob("*"):
print(p.name)
zip_files = (str(x) for x in p.joinpath("*.zip").glob("*.zip"))
df = process_files(zip_files)
A possible solution using os.walk()
:使用
os.walk()
的可能解决方案:
for root, dirs, files in os.walk(main_path):
for file in files:
if file.endswith('.zip'):
df = process_files(os.path.join(root, file)) #function
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.