使用lxml和xpath解析Html

Question

I am trying to use lxml with python because after reading and doing google recommendation is to use lxml over other parsing packages. 我正在尝试使用python的lxml，因为在阅读并做谷歌推荐是使用lxml而不是其他解析包。 I have following dom structure and I manage write the correct xpath and I double check my xpath on xpath check to confirm the validity of it. 我有以下dom结构，我管理写正确的xpath我仔细检查xpath检查我的xpath以确认它的有效性。 Xpath works fine on Xpath Checker but when I put it with lxml in python I am not getting results infract I get object instead of actual text. Xpath在Xpath Checker上运行正常但是当我在python中使用lxml时，我没有得到结果infract我得到的是对象而不是实际的文本。

Here is my dom structure: 这是我的dom结构：

<div class="pdsc-l">
<table width="100%" cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<tr>
<tr>
<tr>
<tr>
<tr>
<td width="35%" valign="top">
<font size="2" face="Arial, Helvetica, sans-serif">Brand</font>
</td>
<td width="65%" valign="top">
<font size="2" face="Arial, Helvetica, sans-serif">HTC</font>
</td>
</tr>
<tr>
<td width="35%" valign="top">
<td width="65%" valign="top">

Following xpath that I wrote gives me what I want.. 我写的xpath后给了我想要的东西..

//td//font[text()='Brand']/following::td[1]

But with lxml I am nto getting the result: 但是使用lxml我得到的结果是：

This is my code:
    rawPage = urllib2.urlopen(request)
    read = rawPage.read()
    #print read
    tree = etree.HTML(read)    
    for tr in tree.xpath("//tr"):
        print tr.xpath("//td//font[text()='Brand']/following::td[1]")

Here is the out put 这是输出

[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]
[<Element td at 0x10ad80b90>]

I tried it with following change but still i don't get the result, The code I wrote has the url, hopefully that will help for a better answer: 我尝试了以下更改，但仍然没有得到结果，我写的代码有网址，希望这将有助于更好的答案：

from lxml import etree
from lxml.html import fromstring, tostring
    url = 'http://www.ebay.com/ctg/111176858'
    request = urllib2.Request(url)
    rawPage = urllib2.urlopen(request)
    read = rawPage.read()
    #print read
    tree = etree.HTML(read)    
    for tr in tree.xpath("//tr"):
        t = tr.xpath("//td//font[text()='Brand']/following::td[1]")[0]
        print tostring(t)

Answer 1

appending a [0].text to the end of the print statement in your answer should give you what you want. 在你的答案中将[0].text附加到print语句的末尾应该给你你想要的东西。 Basically, what's being printed in your question are single-element lists of lxml.etree._Element s, which have attributes like tag and text that you can use to get different properties. 基本上，在你的问题中打印的是lxml.etree._Element的单元素列表，它们具有tag和text等属性，可用于获取不同的属性。 So, try 所以，试试吧

tr.xpath("//td//font[text()='Brand']/following::td[1]")[0].text

使用lxml和xpath解析Html

问题描述

1 个解决方案

解决方案1
8 2012-08-28 18:48:29

使用lxml和xpath解析Html

问题描述

1 个解决方案

解决方案1 8 2012-08-28 18:48:29

解决方案1
8 2012-08-28 18:48:29