简体   繁体   中英

Python lxml xpath find node with text()=concat('x', 'y')

I'm trying to parse an xml file with python lxml xpath, the structure is like this:

<body>
  <tu changedate="20190822T080742Z" creationdate="20190822T085527Z" creationid="blank" changeid="blank">
    <prop type="client"> </prop>
    <prop type="project"> </prop>
    <prop type="domain"> </prop>
    <prop type="subject"> </prop>
    <prop type="corrected">no</prop>
    <prop type="aligned">no</prop>
    <prop type="x-document">Test_EN.docx</prop>
    <prop type="x-Project Id">0001</prop>
    <prop type="x-Product group">A</prop>
    <prop type="x-Product">A</prop>
    <prop type="x-Product">B</prop>
    <prop type="x-TestList">TestValue1</prop>
    <prop type="x-TestList">TestValue2</prop>
    <prop type="x-Sample">SampleText</prop>
    <prop type="x-Test">TestText</prop>
    <prop type="x-Name">TestName</prop>

to dynamically find nodes with a function I save the names and values of nodes that I'm looking for to variable names.

node_name = x-Sample
node_value = SampleText
xpath_expression = f'//body/tu/prop[@type="{node_name}"][text()="{node_value}"]'
elements = tree.xpath(xpath_expression)

The problem is that node_value can contain double quotes and therefore produces an invalid xpath expression. Since I am stuck with lxml and it uses xpath 1.0 I can't escape them in the string.

Looking through stackoverflow I found that apparently this can only be done in xpath 1.0 by using concat. I also found the following function posted:

def xpath_string_escape(input_str):
    """ creates a concatenation of alternately-quoted strings that is always a valid XPath expression """
    parts = input_str.split('"')
    return "concat('" + "', \"'\" , '".join(parts) + "', '')"

Which then gives me this:

xpath_expression = '//body/tu/tuv/prop[@type="x-Sample"][text()="concat('SampleText', '')"]'

However this doesn't return the nodes I'm looking for.

Alternative. You can remove the double quotes from your node value with:

node_value = translate(//prop[@type="x-Sample"]/text(),'"',"")

Then use contains() instead of text() to build your XPath expression:

xpath_expression = f'//body/tu/prop[@type="{node_name}"][contains(.,"{node_value}")]'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM