在Python中使用XPath提取特定HTML元素的值

Question

I have tried this 我已经试过了

url = 'http://test.ir/'
content = s.get(url).content
tree = html.fromstring(content)
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]/text()[not(self:div)]')]

As you can see in the picture I want the selected part: 如您在图片中看到的，我希望选择零件： 在此处输入图片说明

When I use 当我使用

print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]')]

The result shows me the selected part and the content of <div class="grouptext"> as well. 结果显示了所选部分以及 <div class="grouptext">的内容。

Answer 1

Assuming that you just want the text() of the first occurence of the <div> tag you have to be more specific in your XPath expression. 假设只希望<div>标记首次出现的text() ，则必须在XPath表达式中更具体。 Either you tell the system that you explictly want the first one by adding [1] 您可以通过添加[1]告诉系统您明确想要第一个

print [e.text_content() for e in tree.xpath('//div[@class="grouptext"][1]')]

or you could select it by filtering for the style parameter: 或者您可以通过过滤style参数来选择它：

print [e.text_content() for e in tree.xpath('//div[@class="grouptext" and @style]')]

You will have to decide which is the better way to go. 您将不得不决定哪种方法更好。 This will depend on how the <div> tags show up in your XML in a more general case. 在更一般的情况下，这取决于<div>标记在XML中的显示方式。

在Python中使用XPath提取特定HTML元素的值

问题描述

1 个解决方案

解决方案1
1 已采纳 2014-09-30 19:16:51

在Python中使用XPath提取特定HTML元素的值

问题描述

1 个解决方案

解决方案1 1 已采纳 2014-09-30 19:16:51

解决方案1
1 已采纳 2014-09-30 19:16:51