[英]Extracting the value of a specific HTML element using XPath in Python
I have tried this 我已经试过了
url = 'http://test.ir/'
content = s.get(url).content
tree = html.fromstring(content)
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]/text()[not(self:div)]')]
As you can see in the picture I want the selected part: 如您在图片中看到的,我希望选择零件:
When I use 当我使用
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"]')]
The result shows me the selected part and the content of <div class="grouptext">
as well. 结果显示了所选部分以及 <div class="grouptext">
的内容。
Assuming that you just want the text()
of the first occurence of the <div>
tag you have to be more specific in your XPath expression. 假设只希望<div>
标记首次出现的text()
,则必须在XPath表达式中更具体。 Either you tell the system that you explictly want the first one by adding [1]
您可以通过添加[1]
告诉系统您明确想要第一个
print [e.text_content() for e in tree.xpath('//div[@class="grouptext"][1]')]
or you could select it by filtering for the style
parameter: 或者您可以通过过滤style
参数来选择它:
print [e.text_content() for e in tree.xpath('//div[@class="grouptext" and @style]')]
You will have to decide which is the better way to go. 您将不得不决定哪种方法更好。 This will depend on how the <div>
tags show up in your XML in a more general case. 在更一般的情况下,这取决于<div>
标记在XML中的显示方式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.