如何在 python 中將 zip 文件刮成單個 dataframe

Question

我對 web 報廢非常陌生，我正在嘗試了解如何擦除該網站上的所有 zip 文件和常規文件。 最終目標是抓取所有數據，我最初認為我可以使用 pd.read_html 並輸入每個鏈接的列表並循環遍歷每個 zip 文件。

我對 web 抓取非常陌生，所以任何幫助都會非常有用，到目前為止我已經嘗試了一些示例，請參閱下面的代碼

import pandas as pd
pd.read_html("https://www.omie.es/en/file-access-list?parents%5B0%5D=/&parents%5B1%5D=Day-ahead%20Market&parents%5B2%5D=1.%20Prices&dir=%20Day-ahead%20market%20hourly%20prices%20in%20Spain&realdir=marginalpdbc",match="marginalpdbc_2017.zip")

所以這就是我希望 output 看起來的樣子，除了每個 zip 文件都需要是它自己的數據幀才能使用/循環。 目前，它似乎正在做的只是下載 zip 文件的所有名稱，而不是實際數據。

謝謝

Answer 1

要打開 zipfile 並將那里的文件讀取到 dataframe，您可以使用下一個示例：

import requests
import pandas as pd
from io import BytesIO
from zipfile import ZipFile

zip_url = "https://www.omie.es/es/file-download?parents%5B0%5D=marginalpdbc&filename=marginalpdbc_2017.zip"

dfs = []
with ZipFile(BytesIO(requests.get(zip_url).content)) as zf:
    for file in zf.namelist():
        df = pd.read_csv(
            zf.open(file),
            sep=";",
            skiprows=1,
            skipfooter=1,
            engine="python",
            header=None,
        )
        dfs.append(df)

final_df = pd.concat(dfs)

# print first 10 rows:
print(final_df.head(10).to_markdown(index=False))

印刷：

0	1	2	3	4	5	6
2017	1	1	1	58.82	58.82	楠
2017	1	1	2	58.23	58.23	楠
2017	1	1	3	51.95	51.95	楠
2017	1	1	4	47.27	47.27	楠
2017	1	1	5	46.9	45.49	楠
2017	1	1	6	46.6	44.5	楠
2017	1	1	7	46.25	44.5	楠
2017	1	1	8	46.1	44.72	楠
2017	1	1	9	46.1	44.22	楠
2017	1	1	10	45.13	45.13	楠

如何在 python 中將 zip 文件刮成單個 dataframe

問題描述

1 個解決方案

解決方案1
1 已采納 2022-08-30 21:23:49

如何在 python 中將 zip 文件刮成單個 dataframe

問題描述

1 個解決方案

解決方案1 1 已采納 2022-08-30 21:23:49

解決方案1
1 已采納 2022-08-30 21:23:49