简体   繁体   English

如何在Python中删除XML元素?

[英]How do I delete element of XML in Python?

I am trying to remove some of element in xml file with ElementTree. 我正在尝试使用ElementTree删除xml文件中的某些元素。 Mycode doesn't give any error but it doesn't do what I want. Mycode不会给出任何错误,但是不能满足我的要求。 I want to enter CHAIN_ID and RES_POSITION and when I look new written xml file I want to see this residue is deleted. 我想输入CHAIN_IDRES_POSITION ,当我查找新的书面xml文件时,我想看到此残基已被删除。

My xml file is too large so here is a sample of it: 我的xml文件太大,因此这里是一个示例:

<SEQ>
   <CHAIN>
      <CHAIN_ID>A</CHAIN_ID>
      <RESIDUE>
         <RES_POSITION>1</RES_POSITION>
         <AA_CODE>S</AA_CODE>
      </RESIDUE>
      <RESIDUE>
         <RES_POSITION>2</RES_POSITION>
         <AA_CODE>E</AA_CODE>
      </RESIDUE>
      <RESIDUE>
         <RES_POSITION>3</RES_POSITION>
         <AA_CODE>H</AA_CODE>
      </RESIDUE>

Mycode: mycode的:

def deleted_residue(mychain_id, myresidue_id, file):
    mytree = ET.parse(file)
    chain = [seq for seq in mytree.findall('.//CHAIN') if seq.findtext('.//CHAIN_ID') == mychain_id]
    sequence = [res for res in mytree.findall('.//RESIDUE') if res.findtext('.//RES_POSITION') == myresidue_id]
    for seq in chain:
        for res in sequence:
            if mychain_id == "A" and myresidue_id == "2":
                seq.remove(res)
                return deleted_residue("A", "2", "pdb_one_letter.xml")

ET.tostring(SEQ, encoding='utf8').decode('utf8')
tree.write("pdb_one_letter_deleted.xml")
from xml.dom import minidom

pdbtoxml = minidom.parseString(ET.tostring(SEQ)).toprettyxml(indent="   ")
with open("pdb_one_letter_deleted.xml", "w") as pdb:
    pdb.write(pdbtoxml)

Your code is a little confusing; 您的代码有些混乱; especially the list comprehension part and the use of minidom. 特别是列表理解部分和简约的使用。

Based on this: 基于此:

I am trying to remove some of element in xml file with ElementTree. 我正在尝试使用ElementTree删除xml文件中的某些元素。 Mycode doesn't give any error but it doesn't do what I want. Mycode不会给出任何错误,但是不能满足我的要求。 I want to enter CHAIN_ID and RES_POSITION and when I look new written xml file I want to see this residue is deleted. 我想输入CHAIN_ID和RES_POSITION,当我查找新的书面xml文件时,我想看到此残基已被删除。

I think you can simplify by doing the value testing in XPath predicates ... 我认为您可以通过在XPath谓词中进行值测试来简化...

XML Input (test.xml) XML输入 (test.xml)

<SEQ>
   <CHAIN>
      <CHAIN_ID>A</CHAIN_ID>
      <RESIDUE>
         <RES_POSITION>1</RES_POSITION>
         <AA_CODE>S</AA_CODE>
      </RESIDUE>
      <RESIDUE>
         <RES_POSITION>2</RES_POSITION>
         <AA_CODE>E</AA_CODE>
      </RESIDUE>
      <RESIDUE>
         <RES_POSITION>3</RES_POSITION>
         <AA_CODE>H</AA_CODE>
      </RESIDUE>
   </CHAIN>
</SEQ>

Python 3.x Python 3.x

import xml.etree.ElementTree as ET

def deleted_residue(mychain_id, myresidue_id, file):
    tree = ET.parse(file)
    for chain in tree.findall(f".//CHAIN[CHAIN_ID='{mychain_id}']"):
        for residue in chain.findall(f"./RESIDUE[RES_POSITION='{myresidue_id}']"):
            chain.remove(residue)
    tree.write(file)

deleted_residue("A", "2", "test.xml")

XML Output (modified test.xml) XML输出 (修改后的test.xml)

<SEQ>
   <CHAIN>
      <CHAIN_ID>A</CHAIN_ID>
      <RESIDUE>
         <RES_POSITION>1</RES_POSITION>
         <AA_CODE>S</AA_CODE>
      </RESIDUE>
      <RESIDUE>
         <RES_POSITION>3</RES_POSITION>
         <AA_CODE>H</AA_CODE>
      </RESIDUE>
   </CHAIN>
</SEQ>

If you need to remove more than one RESIDUE , it would make more sense to parse the XML outside of the function and pass in the tree instead. 如果需要删除多个RESIDUE ,则更有意义的是在函数外部解析XML并传入树中。

Hopefully this helps. 希望这会有所帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM