简体   繁体   中英

parsing some XML fields to text file in python

i'm trying to parse XML file to txt file (mainly to get the Text's body), but the for loop wouldn't run hence wouldn't append results to the file, i know i'm missing something in the XML I tried to create an outer for loop in which it will findall MAEC_Bundle before finding the behaviours (I think because it's the root ?).

this is the XML file

<MAEC_Bundle xmlns:ns1="http://xml/metadataSharing.xsd" xmlns="http://maec.mitre.org/XMLSchema/maec-core-1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maec.mitre.org/XMLSchema/maec-core-1 file:MAEC_v1.1.xsd" id="maec:thug:bnd:1" schema_version="1.100000">
    <Analyses>
        <Analysis start_datetime="2019-11-25 21:41:59.491211" id="maec:thug:ana:2" analysis_method="Dynamic">
            <Tools_Used>
                <Tool id="maec:thug:tol:1">
                    <Name>Thug</Name>
                    <Version>0.9.40</Version>
                    <Organization>The Honeynet Project</Organization>
                </Tool>
            </Tools_Used>
        </Analysis>
    </Analyses>
    <Behaviors>
        <Behavior id="maec:thug:bhv:4">
            <Description>
                <Text>[window open redirection] about:blank -&gt; http://desbloquear.celularmovel.com/</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:5">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/ (Status: 200, Referer: None)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:6">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/ (Content-type: text/html, MD5: f1fb042c62910c34be16ad91cbbd71fa)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:7">
            <Description>
                <Text>[meta redirection] http://desbloquear.celularmovel.com/ -&gt; http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:8">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Status: 200, Referer: http://desbloquear.celularmovel.com/)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
        <Behavior id="maec:thug:bhv:9">
            <Description>
                <Text>[HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Content-type: text/html, MD5: a28fe921afb898e60cc334e06f71f46e)</Text>
            </Description>
            <Discovery_Method tool_id="maec:thug:tol:1" method="Dynamic Analysis"/>
        </Behavior>
    </Behaviors>
    <Pools/>
</MAEC_Bundle>

this is the code for parsing in python, the code below only writes operation to the file but does not enter the loop

 import xml.etree.ElementTree as ET


def logsParsing():
    tree = ET.parse(
        'analysis.xml')
    root = tree.getroot()
    with open('sample1.txt', 'w') as f:
        f.write('Operation\n')
        with open('sample1.txt', 'a') as f:
            for behavior in root.findall('Behaviors'):
                operation = behavior.find('Behavior').find('Description').find('Text').text
                line_to_write = operation + '\n'
                f.write(line_to_write)
    f.close()


logsParsing()

Listing [Python 3.Docs]: xml.etree.ElementTree - The ElementTree XML API . You might want to insist on the following sections:

  • Parsing XML with Namespaces
  • XPath support

Here's a way of handling things.

code00.py :

#!/usr/bin/env python3

import sys
import xml.etree.ElementTree as ET


def main():
    tree = ET.parse("analysis.xml")
    root_node = tree.getroot()
    namespaces = {
        "xmlns": "http://maec.mitre.org/XMLSchema/maec-core-1",  # Namespace (default) from XML file (this is the only one we need, as tags that matter to us are not prefixed)
    }
    xpath = "./{0:s}:Behaviors/{0:s}:Behavior/{0:s}:Description/{0:s}:Text".format("xmlns")  # Compute each "Text" node full path
    print("Nodes to search: {0:s}".format(xpath))
    text_nodes = root_node.findall(xpath, namespaces)
    with open("sample1.txt", "w") as fout:  # Only open the out file once
        node_count = 0
        fout.write("Operation:\n")
        for text_node in text_nodes:
            fout.write(text_node.text + "\n")
            node_count += 1
        print("Wrote {0:d} nodes info.".format(node_count))


if __name__ == "__main__":
    print("Python {0:s} {1:d}bit on {2:s}\n".format(" ".join(item.strip() for item in sys.version.split("\n")), 64 if sys.maxsize > 0x100000000 else 32, sys.platform))
    main()
    print("\nDone.")

Output :

 [cfati@CFATI-5510-0:e:\\Work\\Dev\\StackOverflow\\q059057339]> "e:\\Work\\Dev\\VEnvs\\py_064_03.07.03_test0\\Scripts\\python.exe" code00.py Python 3.7.3 (v3.7.3:ef4ec6ed12, Mar 25 2019, 22:22:05) [MSC v.1916 64 bit (AMD64)] 64bit on win32 Nodes to search: ./xmlns:Behaviors/xmlns:Behavior/xmlns:Description/xmlns:Text Wrote 6 nodes info. Done. [cfati@CFATI-5510-0:e:\\Work\\Dev\\StackOverflow\\q059057339]> type sample1.txt Operation: [window open redirection] about:blank -> http://desbloquear.celularmovel.com/ [HTTP] URL: http://desbloquear.celularmovel.com/ (Status: 200, Referer: None) [HTTP] URL: http://desbloquear.celularmovel.com/ (Content-type: text/html, MD5: f1fb042c62910c34be16ad91cbbd71fa) [meta redirection] http://desbloquear.celularmovel.com/ -> http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi [HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Status: 200, Referer: http://desbloquear.celularmovel.com/) [HTTP] URL: http://desbloquear.celularmovel.com/cgi-sys/defaultwebpage.cgi (Content-type: text/html, MD5: a28fe921afb898e60cc334e06f71f46e)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM