Python从URL检索值

Question

我正在尝试编写一个python脚本，检查money.rediff.com的特定股票价格并打印出来。 我知道这可以通过他们的API轻松完成，但我想了解urllib2是如何工作的，所以我试图以老式的方式做到这一点。 但是，我坚持如何使用urllib。 许多在线教程向我询问了我需要返回的值的“Inspect元素”并拆分字符串以获取它。 但是，视频中的所有示例都具有可轻松拆分HTML标签的值，但我的具有以下内容：

<div class="f16">
<span id="ltpid" class="bold" style="color: rgb(0, 0, 0); background: rgb(255, 255, 255);">6.66</span> &nbsp; 
<span id="change" class="green">+0.50</span> &nbsp; 

<span id="ChangePercent" style="color: rgb(130, 130, 130); font-weight: normal;">+8.12%</span>
</div>

我只需要Line2中的“6.66”。 我该怎么做呢？ 我对Urllib2和Python非常陌生。 所有帮助将不胜感激。 提前致谢。

Answer 1

你当然可以用urllib2和正则表达式来做到这一点，但我鼓励你使用更好的工具，即requests和Beautiful Soup 。

这是一个完整的程序来获取“塔塔汽车有限公司”的报价：

from bs4 import BeautifulSoup
import requests

html = requests.get('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').content

soup = BeautifulSoup(html, 'html.parser')
quote = float(soup.find(id='ltpid').get_text())

print(quote)

编辑

这是一个只使用urllib2和re的Python 2版本：

import re
import urllib2

html = urllib2.urlopen('http://money.rediff.com/companies/Tata-Motors-Ltd/10510008').read()

quote = float(re.search('<span id="ltpid"[^>]*>([^<]*)', html).group(1))

print quote

Answer 2

BeautifulSoup适合html解析

from bs4 import BeautifulSoup

##Use your urllib code to get the source code of the page
source = (Your get code here)
soup = BeautifulSoup(source)
##This assumes the id 'ltpid' is the one you are looking for all the time
span = soup.find('span', id="ltpid")
float(span.text)  #will return 6.66

Answer 3

使用BeautifulSoup而不是正则表达式来解析HTML。

Python从URL检索值

问题描述

3 个解决方案

解决方案1
2 已采纳 2016-08-26 03:43:28

解决方案2
1 2016-08-26 03:44:10

解决方案3
1

Python从URL检索值

问题描述

3 个解决方案

解决方案1 2 已采纳 2016-08-26 03:43:28

解决方案2 1 2016-08-26 03:44:10

解决方案3 1

解决方案1
2 已采纳 2016-08-26 03:43:28

解决方案2
1 2016-08-26 03:44:10

解决方案3
1