[英]Python Web Scraping Problems
I am using Python to scrape AAPL's stock price from Yahoo finance. 我正在使用Python从雅虎财务中攫取AAPL的股票价格。 But the program always returns
[]
. 但程序总是返回
[]
。 I would appreciate if someone could point out why the program is not working. 如果有人能指出该计划无效的原因,我将不胜感激。 Here is my code:
这是我的代码:
import urllib
import re
htmlfile=urllib.urlopen("https://ca.finance.yahoo.com/q?s=AAPL&ql=0")
htmltext=htmlfile.read()
regex='<span id=\"yfs_l84_aapl\" class="">(.+?)</span>'
pattern=re.compile(regex)
price=re.findall(pattern,htmltext)
print price
The original source is like this: 原始来源是这样的:
<span id="yfs_l84_aapl" class>112.31</span>
Here I just want the price 112.31. 在这里,我只想要价格112.31。 I copy and paste the code and find 'class' changes to 'class=""'.
我复制并粘贴代码并找到'class'更改为'class =“”'。 I also tried code
我也试过代码
regex='<span id=\"yfs_l84_aapl\" class="">(.+?)</span>'
But it does not work either. 但它也不起作用。
Well, the good news is that you are getting the data. 嗯,好消息是你得到了数据。 You were nearly there.
你快到了。 I would recommend that you work our your regex problems in a tool that helps, eg regex101 .
我建议你在一个有用的工具中解决你的正则表达式问题,例如regex101 。
Anyway, here is your working regex: 无论如何,这是你的工作正则表达式:
regex='<span id="yfs_l84_aapl">(\d*\.\d\d)'
You are collecting only digits, so don't do the general catch, be specific where you can. 你只收集数字,所以不要做一般的捕获,具体到你可以。 This is multiple digits, with a decimal literal, with two more digits.
这是多位数,带有十进制文字,还有两位数。
When I went to the yahoo site you provided, I saw a span tag without class attribute. 当我去你提供的雅虎网站时,我看到了一个没有class属性的span标签。
<span id="yfs_l84_aapl">112.31</span>
Not sure what you are trying to do with "class." 不确定你要用“课堂”做什么。 Without that I get 112.31
没有我,我得到112.31
import urllib
import re
htmlfile=urllib.urlopen("https://ca.finance.yahoo.com/q?s=AAPL&ql=0")
htmltext=htmlfile.read()
regex='<span id=\"yfs_l84_aapl\">(.+?)</span>'
pattern=re.compile(regex)
price=re.findall(pattern,htmltext)
print price
I am using BeautifulSoup to get the text from span tag 我正在使用BeautifulSoup从span标签中获取文本
import urllib
from BeautifulSoup import BeautifulSoup
response =urllib.urlopen("https://ca.finance.yahoo.com/q?s=AAPL&ql=0")
html = response.read()
soup = BeautifulSoup(html)
# find all the spans have id = 'yfs_l84_aapl'
target = soup.findAll('span',{'id':"yfs_l84_aapl"})
# target is a list
print(target[0].string)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.