簡體   English   中英

無法使用 python requests 庫下載壓縮的 csv 文件

[英]Failing to download a zipped csv file using python requests library

我正在嘗試從https://www1.nseindia.com/products/content/derivatives/equities/archieve_fo.htm下載壓縮的 csv 文件。

要查看文件,請選擇報告:Bhavcopy 和日期:18-06-2021。 它提供帶有 url https://www1.nseindia.com/content/historical/DERIVATIVES/2021/JUN/fo18JUN2021bhav.csv.zip 的zip 文件 fo18JUN2021bhav.csv.zip

現在,當我使用請求庫對壓縮的 csv url(帶有適當的標頭)進行 get 調用時,我收到 [404] 錯誤。

有沒有辦法以編程方式下載它? 謝謝 !

要獲取文件,請設置User-AgentReferer HTTP 標頭:

import requests
from bs4 import BeautifulSoup

# change filetype and date you want to search for:
url = "https://www1.nseindia.com/ArchieveSearch?h_filetype=fobhav&date=18-06-2021&section=FO"

headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0",
    "Referer": "https://www1.nseindia.com/products/content/derivatives/equities/archieve_fo.htm",
}

soup = BeautifulSoup(requests.get(url, headers=headers).content, "html.parser")
csv_url = "https://www1.nseindia.com" + soup.a["href"]

print("Downloading {}...".format(csv_url))
with open(csv_url.split("/")[-1], "wb") as f_out:
    f_out.write(requests.get(csv_url, headers=headers).content)
print("Done.")

下載fo18JUN2021bhav.csv.zip文件:

Downloading https://www1.nseindia.com/content/historical/DERIVATIVES/2021/JUN/fo18JUN2021bhav.csv.zip...
Done.
$ ls -alF fo18JUN2021bhav.csv.zip 
-rw-r--r-- 1 root root 632963 june 20 23:11 fo18JUN2021bhav.csv.zip

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM