I'm trying to use the python requests library to download a file from this link: http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download
Clicking on this link will give you a file (nasdaq.csv) only when using a browser. I used the Firefox Network Monitor Ctrl-Shift-Q to retrieve all the headers that Firefox sends. So now I finally get a 200 server response but still no file. The file that this script produces contains parts of the Nasdaq website, not the csv data. I looked at similar questions on this site and nothing leads me to believe that this shouldn't be possible with the requests library.
Code:
import requests
url = "http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download"
# Fake Firefox headers
headers = {"Host" : "www.nasdaq.com", \
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; rv:42.0) Gecko/20100101 Firefox/42.0", \
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8", \
"Accept-Language": "en-US,en;q=0.5", \
"Accept-Encoding": "gzip, deflate", \
"DNT": "1", \
"Cookie": "clientPrefs=||||lightg; userSymbolList=EOD+&DIT; userCookiePref=true; selectedsymbolindustry=EOD,; selectedsymboltype=EOD,EVERGREEN GLOBAL DIVIDEND OPPORTUNITY FUND COMMON SHARES OF BENEFICIAL INTEREST,NYSE; c_enabled$=true", \
"Connection": "keep-alive", }
# Get the list
response = requests.get(url, headers, stream=True)
print(response.status_code)
# Write server response to file
with open("nasdaq.csv", 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
You don't need to supply any headers:
import requests
url = "http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download"
response = requests.get(url, stream=True)
print(response.status_code)
# Write server response to file
with open("nasdaq.csv", 'wb') as f:
for chunk in response.iter_content(chunk_size=1024):
if chunk: # filter out keep-alive new chunks
f.write(chunk)
You can also just write the content:
import requests
# Write server response to file
with open("nasdaq.csv", 'wb') as f:
f.write(requests.get(url).content)
Or use urlib:
urllib.urlretrieve("http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download","nasdaq.csv")
All methods give you the 3137 line csv file:
"Symbol","Name","LastSale","MarketCap","ADR TSO","IPOyear","Sector","Industry","Summary Quote",
"TFSC","1347 Capital Corp.","9.79","58230920","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfsc",
"TFSCR","1347 Capital Corp.","0.15","0","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfscr",
"TFSCU","1347 Capital Corp.","10","41800000","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfscu",
"TFSCW","1347 Capital Corp.","0.178","0","n/a","2014","Finance","Business Services","http://www.nasdaq.com/symbol/tfscw",
"PIH","1347 Property Insurance Holdings, Inc.","7.51","46441171.61","n/a","2014","Finance","Property-Casualty Insurers","http://www.nasdaq.com/symbol/pih",
"FLWS","1-800 FLOWERS.COM, Inc.","7.87","510463090.04","n/a","1999","Consumer Services","Other Specialty Stores","http://www.nasdaq.com/symbol/flws",
"FCTY","1st Century Bancshares, Inc","7.81","80612492.62","n/a","n/a","Finance","Major Banks","http://www.nasdaq.com/symbol/fcty",
"FCCY","1st Constitution Bancorp (NJ)","12.39","93508122.96","n/a","n/a","Finance","Savings Institutions","http://www.nasdaq.com/symbol/fccy",
"SRCE","1st Source Corporation","30.54","796548769.38","n/a","n/a","Finance","Major Banks","http://www.nasdaq.com/symbol/srce",
"VNET","21Vianet Group, Inc.","20.26","1035270865.78","51099253","2011","Technology","Computer Software: Programming, Data Processing","http://www.nasdaq.com/symbol/vnet",
...................................
If for some reason it does not work for you then you might need to upgrade your version of requests.
You actually don't need those headers. You don't even need to save to a file.
import requests
import csv
url = "http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download"
response = requests.get(url)
data = csv.DictReader(response.content.splitlines())
for row in data:
print row
Sample output:
{'Sector': 'Technology', 'LastSale': '2.46', 'Name': 'Zynga Inc.', '': '', 'Summary Quote': 'http://www.nasdaq.com/symbol/znga', 'Symbol': 'ZNGA', 'Industry': 'EDP Services', 'MarketCap': '2295110123.7', 'IPOyear': '2011', 'ADR TSO': 'n/a'}
You can use csv.reader
instead of DictReader
if you like.
An alternative, and shorter, solution for this problem would be:
import urllib
downloadFile = urllib.URLopener()
downloadFile.retrieve("http://www.nasdaq.com/screening/companies-by-industry.aspx?exchange=NASDAQ&render=download", "companylist.csv")
This code uses the URL Library to create URL Request object ( downloadFile
) and then it retrieves the data from the NASDAQ link and saves it as companylist.csv
.
According to the Python documentation, if you want to send a custom User-Agent (such as the Firefox User-Agent), you can subclass URLopener
and set the version
attribute to the user-agent you would like to use.
Note : According to the Python documentation, as of Python v3.3, urllib.URLopener()
is deprecated. As such, it may eventually be removed from the Python standards. However, as of Python v3.6 (Dev), urllib.URLopener()
is still supported as a legacy interface.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.