简体   繁体   English

在Python中使用XPath提取特定HTML元素的值

[英]Extracting the value of a specific HTML element using XPath in Python

I have tried this 我已经试过了

url = 'http://test.ir/'
content = s.get(url).content
tree = html.fromstring(content)
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]/text()[not(self:div)]')]

As you can see in the picture I want the selected part: 如您在图片中看到的,我希望选择零件: 在此处输入图片说明

When I use 当我使用

print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]')]

The result shows me the selected part and the content of <div class="grouptext"> as well. 结果显示了所选部分以及 <div class="grouptext">的内容。

Assuming that you just want the text() of the first occurence of the <div> tag you have to be more specific in your XPath expression. 假设只希望<div>标记首次出现的text() ,则必须在XPath表达式中更具体。 Either you tell the system that you explictly want the first one by adding [1] 您可以通过添加[1]告诉系统您明确想要第一个

print [e.text_content() for e in tree.xpath('//div[@class="grouptext"][1]')]

or you could select it by filtering for the style parameter: 或者您可以通过过滤style参数来选择它:

print [e.text_content() for e in tree.xpath('//div[@class="grouptext" and @style]')]

You will have to decide which is the better way to go. 您将不得不决定哪种方法更好。 This will depend on how the <div> tags show up in your XML in a more general case. 在更一般的情况下,这取决于<div>标记在XML中的显示方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM