簡體   English   中英

如何提取和處理 zipfile 中的所有文件?

[英]How do I extract and process all the files in a zipfile?

我想提取和處理壓縮文件中的所有文件?

import re
import zipfile
import pathlib
import pandas as pd


# Download mHealth dataset
def parse(zip_file):
    # Extract all the files in output directory
    with zipfile.ZipFile(zip_file, "r") as zfile:

        for file in zfile.extractall():
            if file.is_file():
                old_name = file.stem
                extension = file.suffix
                directory = file.parent

                new_name = re.sub("mHealth_", "", old_name) + extension
                file = file.rename(pathlib.Path(directory, new_name))
        zfile.close()
        return file

追溯錯誤:

Traceback (most recent call last):   
File "C:\Users\User\PycharmProjects\algorithms\project_kmeans.py", line 47,
in <module>
    df_ = parse(zip_file_)   File "C:\Users\User\PycharmProjects\algorithms\project_kmeans.py", line 12,
in parse
    for file in zfile.extractall(): TypeError: 'NoneType' object is not iterable

Process finished with exit code 1

你需要infolist()namelist()而不是extractall()與工作for -loop。

extractall()zip提取文件,但它沒有給出文件名 - 所以它不能與for -loop 一起使用。

infolist()namelist()提供文件名,但它會產生其他問題,因為它提供對象ZipInfostring ,而不是Path ,因此它沒有.is_file.stem等。您必須轉換為Path

import zipfile
import pathlib
import pandas as pd

# Download mHealth dataset
def parse(zip_file):
    
    results = []
    
    # Extract all the files in output directory
    with zipfile.ZipFile(zip_file, "r") as zfile:

        zfile.extractall()  # extract
        
        #for filename in zfile.namelist():
        #    path = pathlib.Path(filename)

        for fileinfo in zfile.infolist():
            filename = fileinfo.filename
            path = pathlib.Path(filename)

            if path.is_file():
                old_name = path.stem
                extension = path.suffix
                directory = path.parent

                new_name = old_name.replace("mHealth_", "") + extension
                path = path.rename(pathlib.Path(directory, new_name))
                print('path:', path)
                results.append([filename, new_name])
                
    df = pd.DataFrame(results, columns=['old', 'new'])
    return df

df = parse('test.zip')
print(df)

文檔:信息列表提取物

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM