简体   繁体   English

使用python和urllib从Yahoo FInance获取数据

[英]Using python and urllib to get data from Yahoo FInance

I was using urllib in python to get stock prices from yahoo finance. 我在python中使用urllib从雅虎财务中获取股票价格。 Here is my code so far: 到目前为止,这是我的代码:

import urllib
import re

name = raw_input(">")

htmlfile = urllib.urlopen("http://finance.yahoo.com/q?s=%s" % name)

htmltext = htmlfile.read()

# The problemed area 
regex = '<span id="yfs_l84_%s">(.+?)</span>' % name

pattern = re.compile(regex)

price = re.findall(pattern, htmltext)

print price

So I enter a value, and the stock price comes out. 所以我输入一个值,股票价格就出来了。 But so far I can get it to display a price, just a blank [ ]. 但到目前为止,我可以让它显示价格,只是一个空白[]。 I hace commented over where I believe the problem is. 我已经评论了我认为问题所在。 Any suggestions? 有什么建议? Thanks. 谢谢。

You have not escaped the forward slash in your regex. 你没有在你的正则表达式中逃脱正斜杠。 Change your regex from: 改变你的正则表达式:

<span id="yfs_l84_%s">(.+?)</span>

to

<span id="yfs_l84_goog">(.+?)<\/span>

This will fix your problem assuming you enter the company's listing code as the input to your code. 假设您输入公司的列表代码作为代码的输入,这将解决您的问题。 Ex; 防爆; goog for google. google for google。

That said, regex is a bad choice for what you are trying to do. 也就是说,正则表达式对于你想要做的事情来说是一个糟糕的选择。 As suggested by others, explore BeautifulSoup which is a Python library for pulling data out of HTML. 正如其他人所建议的那样,探索BeautifulSoup ,这是一个用于从HTML中提取数据的Python库。 With BeautifulSoup your code can be as simple as: 使用BeautifulSoup,您的代码可以像下面这样简单:

from bs4 import BeautifulSoup
import requests

name = raw_input('>')
url = 'http://finance.yahoo.com/q?s={}'.format(name)
r = requests.get(url)
soup = BeautifulSoup(r.text)
data = soup.find('span', attrs={'id':'yfs_l84_'.format(name)})
print data.text

Any reason you can't use pandas? 你有什么理由不能使用熊猫吗? It has good support for financial data scraping and time series analysis. 它对财务数据抓取和时间序列分析有很好的支持。

http://pandas.pydata.org/pandas-docs/stable/remote_data.html http://pandas.pydata.org/pandas-docs/stable/remote_data.html

Here's the yahoo example straight from the documentation : 以下是来自文档的雅虎示例:

In [1]: import pandas.io.data as web
In [2]: import datetime
In [3]: start = datetime.datetime(2010, 1, 1)
In [4]: end = datetime.datetime(2013, 01, 27)
In [5]: f=web.DataReader("F", 'yahoo', start, end)
In [6]: f.ix['2010-01-04']
Out[6]: 
OnOpen               10.17
High               10.28
Low                10.05
Close              10.28
Volume       60855800.00
Adj Close           9.75
Name: 2010-01-04 00:00:00, dtype: float64

This guide will show you how to build Yahoo finance queries that will return csvs. 本指南将向您展示如何构建将返回csvs的Yahoo财务查询。 Then you can use the csv library to parse them easily. 然后,您可以使用csv库轻松解析它们。

If you really want to try hacking through the HTML, use BeautifulSoup . 如果您真的想尝试黑客攻击HTML,请使用BeautifulSoup HTML can't be parsed easily with regexes. 使用正则表达式无法轻松解析HTML。

The best way to get data from Yahoo Finance using python2 or python3 is by using a POST method. 使用python2或python3从Yahoo Finance获取数据的最佳方法是使用POST方法。
You can easily test this out using a Rest service like Postman 您可以使用像Postman这样的Rest服务轻松测试

Open up postman and use Method POST and use this Then you will see the data. 打开邮递员并使用方法POST并使用此然后您将看到数据。 Simply re-create this in python 只需在python中重新创建它

import requests
url="https://query1.finance.yahoo.com/v7/finance/download/GOOG? period1=1519938930&period2=1522354530&interval=1d&events=history&crumb=.tLvYBkGDu3"

response = requests.post(url)
print response.text

I used to get the data using urllib2 but it gives an authorization error now They are probably filtering everything through Rest methods like GET and POST 我曾经使用urllib2获取数据,但它现在给出了授权错误他们可能通过GET和POST之类的Rest方法过滤所有内容

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM