使用ipython和lxml进行网络抓取

Question

i am trying to get the menu items from this website: 我正在尝试从该网站获取menu items ：

http://new.holachef.com/daily_menus?menu_date=2015-07-06

using the following code to target the elements inside which the text exists: 使用以下代码定位文本所在的元素：

from urllib2 import urlopen
from lxml.html import fromstring

def get_page(url):
    html = urlopen(url).read()
    dom = fromstring(html)
    dom.make_links_absolute(url)
    return dom

dom = get_page("http://new.holachef.com/daily_menus?menu_date=2015-07-06")
dom.cssselect("#store_item_64419 > ul > li.meal-discription.clearfix > div.col-xs-8 > h2 > a")

however i get an empty output: 但是我得到一个空的输出：

In [9]: dom.cssselect("#store_item_64419 > ul > li.meal-discription.clearfix > div.col-xs-8 > h2 > a")
Out[9]: []

i want to get the text inside that <a> tag. 我想在<a>标记内获取文本。

Answer 1

我认为您的脚本正在运行这种模式，要求用户选择其位置。

使用ipython和lxml进行网络抓取

问题描述

1 个解决方案

解决方案1
0 2015-07-06 06:16:16

使用ipython和lxml进行网络抓取

问题描述

1 个解决方案

解决方案1 0 2015-07-06 06:16:16

解决方案1
0 2015-07-06 06:16:16