使用来自 python 中的 lxml 请求的命名空间解析 xml

Question

我正在尝试从在线 xml 文件的表格中获取一些文本。 我可以找到表格：

from lxml import etree
import requests

main_file = requests.get('https://training.gov.au/TrainingComponentFiles/CUA/CUAWRT601_R1.xml')
main_file.encoding = 'utf-8-sig'
root = etree.fromstring(main_file.content)
tables = root.xpath('//foo:table', namespaces={"foo": "http://www.authorit.com/xml/authorit"})

print(tables)

但我不能再进一步了。 我正在寻找的文本是：

准备写脚本
编写草稿脚本
制作最终剧本

当我在此处粘贴 xml 时： http://xpather.com/

我可以使用以下表达式获取它： //table[1]/tr/td[@width="2700"]/p[@id="4"][not(*)]/text()

但这在这里行不通，我没有主意。 我怎样才能得到那个文本？

Answer 1

使用您声明的命名空间前缀（使用namespaces={"foo": "http://www.authorit.com/xml/authorit"} ）例如代替//table[1]/tr/td[@width="2700"]/p[@id="4"][not(*)]/text()使用//foo:table[1]/foo:tr/foo:td[@width="2700"]/foo:p[@id="4"][not(*)]/text() 。

使用来自 python 中的 lxml 请求的命名空间解析 xml

问题描述

1 个解决方案

解决方案1
0 已采纳 2022-11-27 08:18:32

使用来自 python 中的 lxml 请求的命名空间解析 xml

问题描述

1 个解决方案

解决方案1 0 已采纳 2022-11-27 08:18:32

解决方案1
0 已采纳 2022-11-27 08:18:32