![](/img/trans.png)
[英]How do I request a zipfile, extract it, then create pandas dataframes from the csv files?
[英]How do I extract and process all the files in a zipfile?
我想提取和處理壓縮文件中的所有文件?
import re
import zipfile
import pathlib
import pandas as pd
# Download mHealth dataset
def parse(zip_file):
# Extract all the files in output directory
with zipfile.ZipFile(zip_file, "r") as zfile:
for file in zfile.extractall():
if file.is_file():
old_name = file.stem
extension = file.suffix
directory = file.parent
new_name = re.sub("mHealth_", "", old_name) + extension
file = file.rename(pathlib.Path(directory, new_name))
zfile.close()
return file
追溯錯誤:
Traceback (most recent call last):
File "C:\Users\User\PycharmProjects\algorithms\project_kmeans.py", line 47,
in <module>
df_ = parse(zip_file_) File "C:\Users\User\PycharmProjects\algorithms\project_kmeans.py", line 12,
in parse
for file in zfile.extractall(): TypeError: 'NoneType' object is not iterable
Process finished with exit code 1
你需要infolist()
或namelist()
而不是extractall()
與工作for
-loop。
extractall()
從zip
提取文件,但它沒有給出文件名 - 所以它不能與for
-loop 一起使用。
infolist()
或namelist()
提供文件名,但它會產生其他問題,因為它提供對象ZipInfo
或string
,而不是Path
,因此它沒有.is_file
、 .stem
等。您必須轉換為Path
。
import zipfile
import pathlib
import pandas as pd
# Download mHealth dataset
def parse(zip_file):
results = []
# Extract all the files in output directory
with zipfile.ZipFile(zip_file, "r") as zfile:
zfile.extractall() # extract
#for filename in zfile.namelist():
# path = pathlib.Path(filename)
for fileinfo in zfile.infolist():
filename = fileinfo.filename
path = pathlib.Path(filename)
if path.is_file():
old_name = path.stem
extension = path.suffix
directory = path.parent
new_name = old_name.replace("mHealth_", "") + extension
path = path.rename(pathlib.Path(directory, new_name))
print('path:', path)
results.append([filename, new_name])
df = pd.DataFrame(results, columns=['old', 'new'])
return df
df = parse('test.zip')
print(df)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.