I was using urllib in python to get stock prices from yahoo finance. Here is my code so far:
import urllib
import re
name = raw_input(">")
htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=%s" % name)
htmltext = htmlfile.read()
# The problemed area
regex = '<span id="yfs_l84_%s">(.+?)</span>' % name
pattern = re.compile(regex)
price = re.findall(pattern, htmltext)
print price
So I enter a value, and the stock price comes out. But so far I can get it to display a price, just a blank [ ]. I hace commented over where I believe the problem is. Any suggestions? Thanks.
You have not escaped the forward slash in your regex. Change your regex from:
<span id="yfs_l84_%s">(.+?)</span>
to
<span id="yfs_l84_goog">(.+?)<\/span>
This will fix your problem assuming you enter the company's listing code as the input to your code. Ex; goog for google.
That said, regex is a bad choice for what you are trying to do. As suggested by others, explore BeautifulSoup which is a Python library for pulling data out of HTML. With BeautifulSoup your code can be as simple as:
from bs4 import BeautifulSoup
import requests
name = raw_input('>')
url = 'http://finance.yahoo.com/q?s={}'.format(name)
r = requests.get(url)
soup = BeautifulSoup(r.text)
data = soup.find('span', attrs={'id':'yfs_l84_'.format(name)})
print data.text
Any reason you can't use pandas? It has good support for financial data scraping and time series analysis.
http://pandas.pydata.org/pandas-docs/stable/remote_data.html
Here's the yahoo example straight from the documentation :
In [1]: import pandas.io.data as web
In [2]: import datetime
In [3]: start = datetime.datetime(2010, 1, 1)
In [4]: end = datetime.datetime(2013, 01, 27)
In [5]: f=web.DataReader("F", 'yahoo', start, end)
In [6]: f.ix['2010-01-04']
Out[6]:
OnOpen 10.17
High 10.28
Low 10.05
Close 10.28
Volume 60855800.00
Adj Close 9.75
Name: 2010-01-04 00:00:00, dtype: float64
This guide will show you how to build Yahoo finance queries that will return csvs. Then you can use the csv
library to parse them easily.
If you really want to try hacking through the HTML, use BeautifulSoup . HTML can't be parsed easily with regexes.
The best way to get data from Yahoo Finance using python2 or python3 is by using a POST method.
You can easily test this out using a Rest service like Postman
Open up postman and use Method POST and use this Then you will see the data. Simply re-create this in python
import requests
url="https://query1.finance.yahoo.com/v7/finance/download/GOOG? period1=1519938930&period2=1522354530&interval=1d&events=history&crumb=.tLvYBkGDu3"
response = requests.post(url)
print response.text
I used to get the data using urllib2 but it gives an authorization error now They are probably filtering everything through Rest methods like GET and POST
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.