如何使用 Python 2 从网站上抓取数据？

Question

So when I run this code I keep getting empty brackets instead of the actual data.所以当我运行这段代码时，我不断得到空括号而不是实际数据。

I am trying to figure out why sense I don't receive any error messages.我想弄清楚为什么我没有收到任何错误消息。

import urllib
import re
symbolslist = ["aapl","spy","goog","nflx"]


for symbol in symbolslist:
    url = "http://finance.yahoo.com/q?s=%s&ql=1"%(symbol)
    htmlfile = urllib.urlopen(url)
    htmltext = htmlfile.read()
    regex = '<span id="yfs_184_%s">(.+?)</span>'%(symbol.lower())
    pattern = re.compile(regex)
    price = re.findall(pattern, htmltext)
    print price

Answer 1

括号出现是因为正则表达式的元素代码不是 184 它的 l84 它的 L 不是一个。

Answer 2

There is a number of libraries around which can help you to scrape sites.有许多库可以帮助您抓取站点。 Take a look at Scrapy or at Beautiful Soup they should support both Python 2 and 3 as far as I know.看看Scrapy或Beautiful Soup ，据我所知，它们应该同时支持 Python 2 和 3。

如何使用 Python 2 从网站上抓取数据？

问题描述

2 个解决方案

解决方案1
1 2015-11-15 19:34:43

解决方案2
0 2015-11-15 19:20:12

如何使用 Python 2 从网站上抓取数据？

问题描述

2 个解决方案

解决方案1 1 2015-11-15 19:34:43

解决方案2 0 2015-11-15 19:20:12

解决方案1
1 2015-11-15 19:34:43

解决方案2
0 2015-11-15 19:20:12