使用Python检索img src属性的xpath语法有问题

Question

I've been trying to figure out the xpath syntax to parse this html, but I haven't been getting the same results as others. 我一直试图找出解析这个html的xpath语法，但我没有得到与其他人相同的结果。 I've been modeling my work after, http://docs.python-guide.org/en/latest/scenarios/scrape/#web-scraping , but I can't get it to work for my html. 我之后一直在为我的工作建模， http：//docs.python-guide.org/en/latest/scenarios/scrape/#web-scraping ，但我不能让它为我的html工作。

<div id="sku-8103">
    <!-- B:649 -->
    <input type="hidden" id="productIdPDP" value="1218866963585"/>
    <input type="hidden" id="skuIdPDP" value="8240103" />
    <input type="hidden" id="enableLightbox" value="" />
    <!-- B:780 -->
    <img src="http://images.bestbuy.com/BestBuy_US/en_US/images/global/buttons/btn_notorderable_pdp.gif" alt="Not Orderable" border="0" id="notorderable" />
    <input name="8240103" type="hidden" value="1">
    <!-- E:780 -->
    <!-- E:649 -->
    </div>

My Code: 我的代码：

import pycurl
import sys
import cStringIO
from lxml import etree
from lxml import html

buf = cStringIO.StringIO()

c = pycurl.Curl()
c.setopt(c.URL, 'http://www.bestbuy.com/site/sony-playstation-4-500gb/8240103.p?id=1218866963585&skuId=8240103')
c.setopt(c.WRITEFUNCTION, buf.write)
c.perform()

data = buf.getvalue()
buf.close()

tree = html.fromstring(data)


product = tree.xpath('//div[@id="sku-8240103"]/img[@src]')
print product

The output is: [] instead of the src value of the image. 输出为： []而不是图像的src值。 I also tried: 我也尝试过：

product = tree.xpath('//div[@id="sku-8240103"]/img[@src]/text()')

but that didn't seem to work either. 但这似乎也没有用。

Answer 1

Your HTML has this: 你的HTML有这个：

<div id="sku-8103">

You're searching with this: 你正在搜索：

product = tree.xpath('//div[@id="sku-8240103"]/img[@src]')

Notice the different SKU number? 注意不同的SKU号码？ There are no matching nodes, and therefore you get back the empty list, [] . 没有匹配的节点，因此您返回空列表[] 。

If you change it like this: 如果你改变它：

product = tree.xpath('//div[@id="sku-8103"]/img[@src]')

You now get a single-element list, like this: 你现在得到一个单元素列表，如下所示：

[<Element img at 0x10c85b890>]

And if you do this: 如果你这样做：

print product[0].attrib['src']

… you get this: 你得到这个：

http://images.bestbuy.com/BestBuy_US/en_US/images/global/buttons/btn_notorderable_pdp.gif

Really, you don't need the [@src] part there; 真的，你不需要那里的[@src]部分; if you're attempting to restrict it to img s that have a src attribute… what other img s do you expect to see? 如果您正在试图限制它img S作一个src属性...还有什么其他img你希望请参阅s？

使用Python检索img src属性的xpath语法有问题

问题描述

1 个解决方案

解决方案1
2 已采纳 2013-11-27 01:39:01

使用Python检索img src属性的xpath语法有问题

问题描述

1 个解决方案

解决方案1 2 已采纳 2013-11-27 01:39:01

解决方案1
2 已采纳 2013-11-27 01:39:01