简体   繁体   English

Python LXML etree.iterparse。 检查当前元素是否符合 XPath

[英]Python LXML etree.iterparse. Check if current element complies with XPath

I would like to read quite big XML as a stream.我想以流的形式阅读相当大的 XML。 But could not find any way to use my old XPathes to find elements.但是找不到任何方法来使用我的旧 XPathes 来查找元素。 Previously files were of moderate size, so in was enough to:以前的文件大小适中,因此 in 足以:

all_elements = []
for xpath in list_of_xpathes:
    all_elements.append(etree.parse(file).getroot().findall(xpath))

Now I am struggling with iterparse.现在我正在为 iterparse 苦苦挣扎。 Ideally the solution would be to compare path of current element with desired xpath:理想情况下,解决方案是将当前元素的路径与所需的 xpath 进行比较:

import lxml.etree as et

xml_file = r"my.xml" # quite big xml, that i should read
xml_paths = ['/some/arbitrary/xpath', '/another/xpath']

all_elements = []
iter = et.iterparse(xml_file, events = ('end',))
for event, element in iter:
    for xpath in xml_paths:
        if element_complies_with_xpath(element, xpath):
            all_elements.append(element)
            break

How is it possible to implement element_complies_with_xpath function using lxml?如何使用 lxml 实现 element_complies_with_xpath 函数?

If first part of the xpath can be extracted then the rest could be tested as follows.如果可以提取 xpath 的第一部分,则可以如下测试其余部分。 Instead of a list of strings, a dict of <first element name>: <rest of the xpath> could be used.可以使用<first element name>: <rest of the xpath>的字典来代替字符串列表。 Parent element could be used as dict key also.父元素也可以用作字典键。
Full xpath: /some/arbitrary/xpath完整的 xpath: /some/arbitrary/xpath
dict : {'some': './arbitrary/xpath'} dict : {'some': './arbitrary/xpath'}

import lxml.etree as et

def element_complies_with_xpath(element, xpath):
    children = element.xpath(xpath)
    print([ "child:" + x.tag for x in children])
    return len(children) > 0

xml_file = r"/home/lmc/tmp/test.xml" # quite big xml, that i should read
xml_paths = [{'membership': './users/user'}, {'entry':'author/name'}]

all_elements = []
iter1 = et.iterparse(xml_file, events = ('end',))

for event, element in iter1:
    for d in xml_paths:
        if element.tag in d and element_complies_with_xpath(element, d[element.tag]):
            all_elements.append(element)
            break

print([x.tag for x in all_elements])

count() xpath function could be used also count() xpath 函数也可以使用

def element_complies_with_xpath(element, xpath):
    children = element.xpath(xpath)
    print( f"child exist: {children}")
    return children

xml_file = r"/home/luis/tmp/test.xml" # quite big xml, that i should read
xml_paths = [{'membership': 'count(./users/user) > 0'}, {'entry':'count(author/name) > 0'}]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM