無法使用 python requests 庫下載壓縮的 csv 文件

Question

我正在嘗試從https://www1.nseindia.com/products/content/derivatives/equities/archieve_fo.htm下載壓縮的 csv 文件。

要查看文件，請選擇報告：Bhavcopy 和日期：18-06-2021。 它提供帶有 url https://www1.nseindia.com/content/historical/DERIVATIVES/2021/JUN/fo18JUN2021bhav.csv.zip 的zip 文件 fo18JUN2021bhav.csv.zip

現在，當我使用請求庫對壓縮的 csv url（帶有適當的標頭）進行 get 調用時，我收到 [404] 錯誤。

有沒有辦法以編程方式下載它？ 謝謝！

Answer 1

要獲取文件，請設置User-Agent和Referer HTTP 標頭：

import requests
from bs4 import BeautifulSoup

# change filetype and date you want to search for:
url = "https://www1.nseindia.com/ArchieveSearch?h_filetype=fobhav&date=18-06-2021&section=FO"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
    "Referer": "https://www1.nseindia.com/products/content/derivatives/equities/archieve_fo.htm",
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
csv_url = "https://www1.nseindia.com" + soup.a["href"]

print("Downloading {}...".format(csv_url))
with open(csv_url.split("/")[-1], "wb") as f_out:
    f_out.write(requests.get(csv_url, headers=headers).content)
print("Done.")

下載fo18JUN2021bhav.csv.zip文件：

Downloading https://www1.nseindia.com/content/historical/DERIVATIVES/2021/JUN/fo18JUN2021bhav.csv.zip...
Done.

$ ls -alF fo18JUN2021bhav.csv.zip 
-rw-r--r-- 1 root root 632963 june 20 23:11 fo18JUN2021bhav.csv.zip

無法使用 python requests 庫下載壓縮的 csv 文件

問題描述

1 個解決方案

解決方案1
1 2021-06-20 21:13:07

無法使用 python requests 庫下載壓縮的 csv 文件

問題描述

1 個解決方案

解決方案1 1 2021-06-20 21:13:07

解決方案1
1 2021-06-20 21:13:07