![](/img/trans.png)
[英]How to download a file using web URL in python? Download through browser works but not through python's requests
[英]Using python requests to mask as a browser and download a file
我正在嘗試使用python請求庫從以下鏈接下載文件: http : //www.nasdaq.com/screening/companies-by-industry.aspx? exchange=NASDAQ&render=download
僅在使用瀏覽器時,單擊此鏈接將為您提供文件(nasdaq.csv)。 我使用Firefox網絡監視器Ctrl-Shift-Q來檢索Firefox發送的所有標頭。 所以現在我終於得到了200服務器響應,但仍然沒有文件。 該腳本生成的文件包含Nasdaq網站的一部分,而不是csv數據。 我在這個網站上看過類似的問題,但沒有什么讓我相信請求庫不可能做到這一點。
碼:
import requests
url = "http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download"
# Fake Firefox headers
headers = {"Host" : "www.nasdaq.com", \
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0", \
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", \
"Accept-Language": "en-US,en;q=0.5", \
"Accept-Encoding": "gzip, deflate", \
"DNT": "1", \
"Cookie": "clientPrefs=||||lightg; userSymbolList=EOD+&DIT; userCookiePref=true; selectedsymbolindustry=EOD,; selectedsymboltype=EOD,EVERGREEN GLOBAL DIVIDEND OPPORTUNITY FUND COMMON SHARES OF BENEFICIAL INTEREST,NYSE; c_enabled$=true", \
"Connection": "keep-alive", }
# Get the list
response = requests.get(url, headers, stream=True)
print(response.status_code)
# Write server response to file
with open("nasdaq.csv", 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
您不需要提供任何標題:
import requests
url = "http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download"
response = requests.get(url, stream=True)
print(response.status_code)
# Write server response to file
with open("nasdaq.csv", 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
您也可以只寫內容:
import requests
# Write server response to file
with open("nasdaq.csv", 'wb') as f:
f.write(requests.get(url).content)
或使用urlib:
urllib.urlretrieve("http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download","nasdaq.csv")
所有方法都會為您提供3137行的csv文件:
"Symbol","Name","LastSale","MarketCap","ADR TSO","IPOyear","Sector","Industry","Summary Quote",
"TFSC","1347 Capital Corp.","9.79","58230920","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfsc",
"TFSCR","1347 Capital Corp.","0.15","0","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfscr",
"TFSCU","1347 Capital Corp.","10","41800000","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfscu",
"TFSCW","1347 Capital Corp.","0.178","0","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfscw",
"PIH","1347 Property Insurance Holdings, Inc.","7.51","46441171.61","n/a","2014","Finance","Property-Casualty Insurers","http://www.nasdaq.com/symbol/pih",
"FLWS","1-800 FLOWERS.COM, Inc.","7.87","510463090.04","n/a","1999","Consumer Services","Other Specialty Stores","http://www.nasdaq.com/symbol/flws",
"FCTY","1st Century Bancshares, Inc","7.81","80612492.62","n/a","n/a","Finance","Major Banks","http://www.nasdaq.com/symbol/fcty",
"FCCY","1st Constitution Bancorp (NJ)","12.39","93508122.96","n/a","n/a","Finance","Savings Institutions","http://www.nasdaq.com/symbol/fccy",
"SRCE","1st Source Corporation","30.54","796548769.38","n/a","n/a","Finance","Major Banks","http://www.nasdaq.com/symbol/srce",
"VNET","21Vianet Group, Inc.","20.26","1035270865.78","51099253","2011","Technology","Computer Software: Programming, Data Processing","http://www.nasdaq.com/symbol/vnet",
...................................
如果由於某種原因它對您不起作用,那么您可能需要升級您的請求版本。
實際上,您不需要這些標題。 您甚至不需要保存到文件。
import requests
import csv
url = "http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download"
response = requests.get(url)
data = csv.DictReader(response.content.splitlines())
for row in data:
print row
樣本輸出:
{'Sector': 'Technology', 'LastSale': '2.46', 'Name': 'Zynga Inc.', '': '', 'Summary Quote': 'http://www.nasdaq.com/symbol/znga', 'Symbol': 'ZNGA', 'Industry': 'EDP Services', 'MarketCap': '2295110123.7', 'IPOyear': '2011', 'ADR TSO': 'n/a'}
如果願意,可以使用csv.reader
代替DictReader
。
針對此問題的另一種更簡短的解決方案是:
import urllib
downloadFile = urllib.URLopener()
downloadFile.retrieve("http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download", "companylist.csv")
此代碼使用URL庫創建URL請求對象( downloadFile
),然后從NASDAQ鏈接檢索數據並將其保存為companylist.csv
。
根據Python文檔,如果要發送自定義的User-Agent(例如Firefox User-Agent),則可以將URLopener
子類URLopener
,並將version
屬性設置為要使用的user-agent。
注意 : 根據Python文檔,從Python v3.3開始,不推薦使用urllib.URLopener()
。 因此,它最終可能會從Python標准中刪除。 但是,從Python urllib.URLopener()
(Dev)開始,仍支持urllib.URLopener()
作為舊版接口。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.