简体   繁体   English

为什么 python lxml etree xpath 返回多个元素?

[英]Why does python lxml etree xpath return more than one element?

I am using lxml etree in python3我在 python3 中使用 lxml etree

My xpath expression is like this, and is able to find the elements that I am looking for in my xhtml.我的 xpath 表达式是这样的,并且能够在我的 xhtml 中找到我要查找的元素。

root = tree.getroot()
map = {'epub': 'http://www.idpf.org/2007/ops', 'm': "http://www.w3.org/1998/Math/MathML"}
mathML_elements = tree.xpath(".//m:math", namespaces=map)

Sample of the parsed xhtml is like this:解析后的 xhtml 示例如下:

</td><td><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="-500"><m:mrow><m:mo>-</m:mo><m:mn>500</m:mn></m:mrow></m:math></td><td>0</td></tr><tr><td>8</td><td>Betalt renter på lånet</td><td>413</td><td></td><td>+</td><td><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="-413"><m:mrow><m:mo>-</m:mo><m:mn>413</m:mn></m:mrow></m:math></td><td>=</td><td><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="-413"><m:mrow><m:mo>-</m:mo><m:mn>413</m:mn></m:mrow></m:math></td><td>+</td><td></td><td>0</td></tr><tr><td>9</td><td>Avskrivning av pc og inventar</td><td>300</td><td><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="-300"><m:mrow><m:mo>-</m:mo><m:mn>300</m:mn></m:mrow></m:math></td><td>+</td><td></td><td>=</td><td><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="-300"><m:mrow><m:mo>-</m:mo><m:mn>300</m:mn></m:mrow></m:math></td><td>+</td><td></td><td>0</td></tr><tr><td>10</td><td>Uttak eier privat</td><td><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="-14 000"><m:mrow><m:mo>-</m:mo><m:mn>14 000</m:mn></m:mrow></m:math></td><td></td><td>+</td><td><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="-14 000"><m:mrow><m:mo>-</m:mo><m:mn>14 000</m:mn></m:mrow></m:math></td><td></td><td><m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="-14 000"><m:mrow><m:mo>-</m:mo><m:mn>14 000</m:mn></m:mrow></m:math></td><td>+</td><td></td><td>0</td></tr><tr><td></td><td>Balansekontoer</td><td></td><td>29 700</td><td>+</td><td>122 680</td><td>=</td><td>103 080</td><td>+</td><td>49 500</td><td>0</td></tr><tr><td></td><td>Balansesum</td><td></td><td></td><td></td><td>152 080</td><td>=</td><td>152 080</td><td></td><td></td><td>0</td></tr></tbody></table>
<p>Vi ser at Trine Dals egenkapital har økt med <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="kr 1037 (kr 103080 - 102043)"><m:mrow><m:mi>kr </m:mi><m:mn>1 037</m:mn><m:mo>⁡</m:mo><m:mfenced><m:mrow><m:mi>kr </m:mi><m:mn>103 080</m:mn><m:mo>-</m:mo><m:mn>102 043</m:mn></m:mrow></m:mfenced></m:mrow></m:math>. Det betyr at det egentlige resultatet av driften denne måneden må være <m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="kr 1037 + kr 14000 = kr 15037"><m:mrow><m:mi>kr </m:mi><m:mn>1 037</m:mn><m:mo>+</m:mo><m:mi>kr </m:mi><m:mn>14 000</m:mn><m:mo>=</m:mo><m:mi>kr </m:mi><m:mn>15 037</m:mn></m:mrow></m:math>. Vi viser for øvrig til resultatregnskapet i neste avsnitt.</p>
<p>✐ <strong>Oppgave 1-1 og 1-2, side 229.</strong></p>

My problem is that some of the elements also contains extra text at the end, as shown in one of the returned nodes from the xpath below:我的问题是某些元素最后还包含额外的文本,如下面的 xpath 的返回节点之一所示:

<m:math xmlns:m="http://www.w3.org/1998/Math/MathML" alttext="kr 1037 + kr 14000 = kr 15037"><m:mrow><m:mi>kr </m:mi><m:mn>1 037</m:mn><m:mo>+</m:mo><m:mi>kr </m:mi><m:mn>14 000</m:mn><m:mo>=</m:mo><m:mi>kr </m:mi><m:mn>15 037</m:mn></m:mrow></m:math>. Vi viser for øvrig til resultatregnskapet i neste avsnitt.

I only want the m:math element, what am I doing wrong?我只想要 m:math 元素,我做错了什么?

That extra text is the.tail property of the _Element .额外的文本是_Element的.tail 属性。

How you handle the tail depends on what you want to do with the element.你如何处理尾巴取决于你想对元素做什么。

For example, if you're using tostring() to serialize the element, you can specify with_tail=False to not include the tail in the serialization.例如,如果您使用tostring()来序列化元素,则可以指定with_tail=False以不在序列化中包含尾部。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM