简体   繁体   English

删除空的 XML 元素 - Python

[英]Remove Empty XML Elements - Python

I am attempting to remove empty XML elements from an XML, but having an issue with the elements that have attributes but no text values.我正在尝试从 XML 中删除空的 XML 元素,但是对于具有属性但没有文本值的元素存在问题。 I can successfully remove the empty XML elements, but cannot preserve the elements with attributes in the final XML.我可以成功删除空的 XML 元素,但无法保留最终 XML 中具有属性的元素。 I would like to essentially clean up the XML and remove empty nodes with no text values altogether, but keep the nodes with attributes.我想基本上清理 XML 并完全删除没有文本值的空节点,但保留具有属性的节点。

Below is the script I am using, along with the input and (desired) output XMLs....any assistance is most appreciated!下面是我正在使用的脚本,以及输入和(所需的)output XMLs ......非常感谢任何帮助!

The Script:剧本:

from lxml import etree
import os

path = "C:\\users\\mdl518\\Desktop\\"

### Removing empty XML elements
tree = etree.parse(os.path.join(path,"my_file.xml"))

for elem in tree.xpath('//*[not(node())]'):
   elem.getparent().remove(elem):

with open(".//new_file.xml","wb") as f:
    f.write(etree.tostring(tree, xml_declaration=True, encoding='utf-8')) ## Removes empty XML elements, including the elements with attributes

Input XML:输入 XML:

<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"   
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"    
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0" 
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">

  <mdl:metadataIdentifier>
    <mcc:MD_Identifier>
        <mnl:type>
          <gam:String>The Metadata File</gam:String>
        </mnl:type>
          <mnl:description codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"/>
         <mnl:address>
          <mnl:defaultLocale>
          </mnl:defaultLocale>
         </mnl:address>
         <lan:language>
           <lan:type>
             <lis:name>English</lis:name>
           </lan:type>
          </lan:language>
      </mcc:MD_Identifier>
      <mcc:contactInfo>
        <mdl:POC>
          <mnl:name>
            <lis:person>Tom</lis:person>
          </mnl:name>
          <mnl:age>
          </mnl:age>
          <mnl:status>
          </mnl:status>
        </mdl:POC>
      </mcc:contactInfo>
    </mdl:metadataIdentifier>
 </nas:metadata>

Output XML: Output XML:

<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"   
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"    
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0" 
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0">
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0">
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0">
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">

  <mdl:metadataIdentifier>
    <mcc:MD_Identifier>
        <mnl:type>
          <gam:String>The Metadata File</gam:String>
        </mnl:type>
        <mnl:description codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"/>
      <lan:language>
        <lan:type>
          <lis:name>English</lis:name>
        </lan:type>
       </lan:language>
     </mcc:MD_Identifier>
     <mcc:contactInfo>
       <mdl:POC>
         <mnl:name>
           <lis:person>Tom</lis:person>
         </mnl:name>
       </mdl:POC>
     </mcc:contactInfo>
   </mdl:metadataIdentifier>
 </nas:metadata>

The xml is your question is not well formed, but assuming that's fixed, try changing this line xml 是您的问题格式不正确,但假设已解决,请尝试更改此行

for elem in tree.xpath('//*[not(node())]'):

to this:对此:

for elem in tree.xpath('//*[not(node())][not(count(./@*))>0]'):

and see if it works.看看它是否有效。

Edit:编辑:

The edited XML in the question still isn't well formed.问题中编辑的 XML 格式仍然不正确。 I tried to fix it and then applied the following:我试图修复它,然后应用以下内容:

xml_str = """<?xml version='1.0' encoding='utf-8'?>
<nas:metadata xmlns:nas="http://www.arcgis.com/schema/nas/base"   
xmlns:mcc="http://standards.org/iso/19115/-3/mcc/1.0"    
xmlns:mdl="http://standards.org/iso/19115/-3/mdl/1.0" 
xmlns:mnl="http://standards.org/iso/19115/-3/mnl/1.0"
xmlns:lan="http://standards.org/iso/19115/-3/lan/1.0"
xmlns:lis="http://standards.org/iso/19115/-3/lis/1.0"
xmlns:gam="http://standards.org/iso/19115/-3/gam/1.0">

  <mdl:metadataIdentifier>
    <mcc:MD_Identifier>
        <mnl:type>
          <gam:String>The Metadata File</gam:String>
        </mnl:type>
          <mnl:description codeList="http://arcgis.com/codelist/ScopeCode" codeListValue="dataset"/>
         <mnl:address>
          <mnl:defaultLocale>
          </mnl:defaultLocale>
         </mnl:address>
         <lan:language>
           <lan:type>
             <lis:name>English</lis:name>
           </lan:type>
          </lan:language>
      </mcc:MD_Identifier>
      <mcc:contactInfo>
        <mdl:POC>
          <mnl:name>
            <lis:person>Tom</lis:person>
          </mnl:name>
          <mnl:age>
          </mnl:age>
          <mnl:status>
          </mnl:status>
        </mdl:POC>
      </mcc:contactInfo>
    </mdl:metadataIdentifier>
 </nas:metadata>

"""
doc = etree.XML(xml_str.encode())
for elem in doc.xpath('//*[not(count(./@*))>0][not(normalize-space(.))]'):
    elem.getparent().remove(elem)
print(etree.tostring(doc, xml_declaration=True, encoding='utf-8').decode())

The output I get from the above is the desired output in the question.我从上面得到的 output 是问题中所需的 output。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM