如何从 python 中的 xml 中提取元素、子元素和完整路径？

Question

I would like to extract an element, including sub-elements and the full path from xml.我想从 xml 中提取一个元素，包括子元素和完整路径。

If this is my xml doc:如果这是我的 xml 文档：

<world>
    <countries>
        <country>
            <name>a</name>
            <description>a short description</description>
            <population>
                <now>250000</now>
                <2000>100000</2000>
            </population>
        </country>
        <country>
            <name>b</name>
            <description>b short description</description>
            <population>
                <now>350000</now>
                <2000>150000</2000>
            </population>
        </country>
    </countries>
</world>

I would like to end up with this (see below) based on an xpath expression of ('//country[name="a"]我想基于 ('//country[name="a"]

<world>
    <countries>
        <country>
            <name>a</name>
            <description>a short description</description>
            <population>
                <now>250000</now>
                <2000>100000</2000>
            </population>
        </country>
    </countries>
</world>

Answer 1

This type of thing can be taken care of using xpath with lxml.这类事情可以使用带有 lxml 的 xpath 来处理。

One thing, though, one of the html tags ( <2000> ) is invalid since it doesn't begin with a letter.不过，有一件事是 html 标签（ <2000> ）之一是无效的，因为它不是以字母开头。 If you have no control over the source, you have to replace the offending tag before parsing and then replace it again after processing.如果您无法控制源，则必须在解析之前替换有问题的标签，然后在处理后再次替换它。

So, all together:所以，一起来：

import lxml.html as lh
countries = """[your html above]"""
doc = lh.fromstring(countries.replace('2000','xxx'))

states = doc.xpath('//country')
for country in states:
    if country.xpath('./name/text()')[0]!='a':
        country.getparent().remove(country)
print(lh.tostring(doc).decode().replace('xxx','2000'))

Output: Output：

<world>
    <countries>
        <country>
            <name>a</name>
            <description>a short description</description>
            <population>
                <now>250000</now>
                <2000>100000</2000>
            </population>
        </country>
        </countries>
</world>

如何从 python 中的 xml 中提取元素、子元素和完整路径？

问题描述

1 个解决方案

解决方案1
0 2021-01-29 13:20:47

如何从 python 中的 xml 中提取元素、子元素和完整路径？

问题描述

1 个解决方案

解决方案1 0 2021-01-29 13:20:47

解决方案1
0 2021-01-29 13:20:47