使用 Python XPath lxml package 刮掉<span>标签</span>中的文字

Question

I am trying to get the text from a tag on a web page.我正在尝试从 web 页面上的标签中获取文本。 Using Chrome's Inspect element feature, I see that the text I want is in the following:使用 Chrome 的 Inspect 元素功能，我看到我想要的文本如下：

<span id>
    <b> Armor Class </b>
    " 12"
</span>

All I want is the text " 12" from the above.我想要的只是上面的文本“12”。 To this end, I have the following python code:为此，我有以下 python 代码：

from lxml import html
import requests, os, json
page = requests.get(webString)
tree = html.fromstring(page.content)

monsterArmor = tree.xpath('/html/body/div[1]/span[2]/text()')
print(monsterArmor)

Where the path present in monsterArmor is a result of copy/pasting the path from Chrome's Inspect element feature. MonsterArmor 中的路径是从 Chrome 的 Inspect 元素功能复制/粘贴路径的结果。

When I print it though, it returns an empty list, [ ].但是，当我打印它时，它会返回一个空列表 [ ]。 I am not sure what I am doing wrong.我不确定我做错了什么。 I have seen similar questions, but they all seem to involve etree and the examples given all seem to have information hardcoded into them, rather than scraping it.我见过类似的问题，但它们似乎都涉及 etree，并且给出的示例似乎都将信息硬编码到其中，而不是抓取它。

EDIT: Here is a screenshot of the page information from Chrome's Inspect:编辑：这是来自 Chrome 的 Inspect 页面信息的屏幕截图：

EDIT: The page URL https://jsigvard.com/dnd/monster.php?m=Aarakocra编辑：页面 URL https://jsigvard.com/dnd/monster.php?m=Aarakocra

Answer 1

Try something like:尝试类似：

for el in tree.xpath('//span[./b[.="Armor Class"]]/text()'):
   print(el)

The output should be 12 . output 应该是12 。

使用 Python XPath lxml package 刮掉<span>标签</span>中的文字

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-07-07 18:57:03

使用 Python XPath lxml package 刮掉<span>标签</span>中的文字

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-07-07 18:57:03

解决方案1
1 已采纳 2020-07-07 18:57:03