简体   繁体   English

findall和xpath问题

[英]findall and xpath problems

I have a text file contains some HTML code called "html.txt" as shown as below: 我有一个文本文件,其中包含一些名为“ html.txt”的HTML代码,如下所示:

<tr>
    <td class="name"><a href="/player/DAVID:RD" class=""><span>David Kwan</span> (DAVID)</a></td>
    <td class="teamid" style="">DAVID:RD</td>
    <td class="">District Player</td>
    <td class="">Red-Dragon Factory</td>
</tr>

Referring to the tutorial I read from the lxml website, I tried to use the etree and findall() methods to extract the table data from the HTML code, but somehow I'm not able to print out in string format, the result I get is <Element td at 0x267c1c0> . 关于我从lxml网站阅读的教程,我尝试使用etreefindall()方法从HTML代码中提取表数据,但是由于某种原因,我无法以字符串格式打印出来,结果得到了是<Element td at 0x267c1c0>
I understand a set or list will return similar when using the findall method, but even if I use the index 0 it also does not help. 我知道使用findall方法时,集或列表将返回相似的结果,但是即使使用索引0,它也无济于事。 Also, using trial and error I attempted to use the str function that support the xpath to force findall return in string format also does not help. 另外,通过尝试和错误,我尝试使用支持xpathstr函数强制以字符串格式返回findall也无济于事。

Can someone advise me on how to correct this? 有人可以建议我如何纠正此问题吗?

from lxml import etree

page = open("C:/Python27/project/lxml_project/html.txt").read()
x = etree.HTML(page)
element = (x.findall('.//td[@class="teamid"]'))
print(element)

My second question is if I use the xpath instead of findall method, will it be a better solution? 我的第二个问题是,如果我使用xpath而不是findall方法,它将是一个更好的解决方案吗? Previously when I tried xpath, it always returned me the first search element even I have multiples of table data <td> tags in the entire html page. 以前,当我尝试xpath时,即使我在整个html页面中有多个表数据<td>标记,它也总是返回第一个搜索元素。 Is it possible to implement xpath recursively with the use of Python LXML library? 是否可以使用Python LXML库递归实现xpath

Use the Element.text attribute to retrieve the text content of an element: 使用Element.text属性检索元素的文本内容:

elements = x.findall('.//td[@class="teamid"]')
print([elem.text for elem in elements])

.findall() returns a list; .findall()返回一个列表; you can use .find() to retrieve just the first match (or None if there are no matching elements). 您可以使用.find()仅检索第一个匹配项(如果没有匹配的元素,则为None )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM