简体   繁体   English

搜索文本并替换为lxml

[英]Search Text and replace with lxml

I need to search text in multi-line XML file where I have multiple tags. 我需要在具有多个标签的多行XML文件中搜索文本。 My XML file looks like this 我的XML文件如下所示

<?xml version="1.0" encoding="utf-8"?>
<nc:data xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
    <system xmlns="http://www.abc.xyz">
      <context>
            <name>context_1</name>
            <host>
                <name>xyz</name>
                <tag1>
                    <name>pqr</name>
                    <role>s1</role>
                    <tag2>test</tag2>
                </tag1>
                <tag2>
                    <name>pqr</name>
                    <role>s1</role>
                    <tag2>test</tag2>
                </tag2>              
            </host>
      </context>
    </system>
</nc:data>

I want to search appearances of text "test" in the XML file and list their parent tag in the output. 我想在XML文件中搜索文本"test"外观,并在输出中列出其父标记。 Unfortunately I am unable to do so. 不幸的是我无法这样做。

The Python code that I have written is : 我写的Python代码是:

import os
import xml 
import sys 
from xml.dom import minidom
import xml.etree.ElementTree as ET

def xml_parsing():
    ''' 
    with open('file.xml', 'rt') as f:
        tree = ET.parse(f)
        for node in tree.findall('.//context'):
            print node, node.tag, node.attrib
            url = node.attrib.get('tag1')
            print url 

xml_parsing()

I am getting blank result as output and unable to do anything beyond it. 我得到的结果是空白,无法执行超出其范围的任何操作。 I have tried both ElementTree and lxml . 我已经尝试了ElementTreelxml I believe it has something to do with the search pattern that I am trying to find using findall . 我相信这与我尝试使用findall查找的搜索模式有关。

Please advise with your expert comments what should be tried now. 请告知您的专家意见,现在应该尝试什么。

I tried the SAX way as well and code is like this: 我也尝试了SAX方式,并且代码是这样的:

xmldoc = minidom.parse('file.xml')
reflist = xmldoc.getElementsByTagName('tag1')
print reflist[0].toxml()

But this returns me the complete line other than just the value between tags. 但这返回了我完整的一行,而不仅仅是标签之间的值。

XPath expression to find element, regardless of the element name and location in the XML document, having text value equals test is //*[text()='test'] or alternatively //*[.='test'] . 无论元素名称和XML文档中的位置如何,用于查找元素的XPath表达式的文本值等于test都是//*[text()='test']//*[.='test']

Consider the following working lxml example that demonstrate finding such elements and update the value : 考虑下面的lxml工作示例,该示例演示如何找到此类元素并更新值:

from lxml import etree as ET

xml = '''<?xml version="1.0" encoding="utf-8"?>
<nc:data xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">
    <system xmlns="http://www.abc.xyz">
      <context>
            <name>context_1</name>
            <host>
                <name>xyz</name>
                <tag1>
                    <name>pqr</name>
                    <role>s1</role>
                    <tag2>test</tag2>
                </tag1>
                <tag2>
                    <name>pqr</name>
                    <role>s1</role>
                    <tag2>test</tag2>
                </tag2>              
            </host>
      </context>
    </system>
</nc:data>'''

tree = ET.fromstring(xml)
for node in tree.xpath("//*[.='test']"):
    #update node value with new text 'foo'
    node.text = 'foo'
    print ET.tostring(node)

output : 输出:

<tag2 xmlns="http://www.abc.xyz" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">foo</tag2>

<tag2 xmlns="http://www.abc.xyz" xmlns:nc="urn:ietf:params:xml:ns:netconf:base:1.0">foo</tag2>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM