简体   繁体   中英

Using Python to extract information from a XML file?

Can anyone offer some help with regards to using Python to extract information from a XML file? This will be my example XML.

<root>
    <number index="2">
        <info>
            <info.RANDOM>Random Text</info.RANDOM>
        </info>
</root>

What I want to print out is the information between the root tags. However, I want it to print it as is, which means all the tags, text in between the tags, and the content within the tag (in this case number index ="2") I have tried itertext(), but that removes the tags and prints only the text in between the root tags. So far, I have a makeshift solution that prints out only the element.tag and the element.text but that does not print out the end tags and the content within the tag. Any help would be appreciated! :)

With s as your input,

s='''<root>
      <number index="2">
        <info>
            <info.RANDOM>Random Text</info.RANDOM>
        </info>
        </number>
</root>'''

Find all tags with tag name number and convert the tag to string using ET.tostring()

import xml.etree.ElementTree as ET
root = ET.fromstring(s)
for node in root.findall('.//number'):
  print ET.tostring(node)

Output:

<number index="2">
        <info>
            <info.RANDOM>Random Text</info.RANDOM>
        </info>
        </number>
from bs4 import BeautifulSoup

xml = "<root><number index=\"2\"><info><info.RANDOM>Random Text</info.RANDOM></info></root>"
soup = BeautifulSoup(xml, "xml")

output = soup.prettify()
print(output[output.find("<root>") + 7:output.rfind("</root>")])    

the + 7 accounts for root>\\n

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM