简体   繁体   中英

Trying to parse through a XML file using Python 3

Hoping y'all can help me out. I am relatively new to python. I have what I need working in powershell but it is so much easier to access the XML elements through the powershell objects than Python it seems. In powershell, I can simply do

[xml]$test = Get-Content .\test.xml

and then iterate through the object to find the information I need. Full disclosure, while XML seems easy I get tripped up with the lingo. Here is a small version of the XML File

<?xml version="1.0" encoding="UTF-8"?>
<!--DISA STIG Viewer :: 2.9-->
<CHECKLIST>
    <ASSET>
        <ROLE>None</ROLE>
        <ASSET_TYPE>Computing</ASSET_TYPE>
        <HOST_NAME></HOST_NAME>
        <HOST_IP></HOST_IP>
        <HOST_MAC></HOST_MAC>
        <HOST_FQDN></HOST_FQDN>
        <TECH_AREA></TECH_AREA>
        <TARGET_KEY>2266</TARGET_KEY>
        <WEB_OR_DATABASE>false</WEB_OR_DATABASE>
        <WEB_DB_SITE></WEB_DB_SITE>
        <WEB_DB_INSTANCE></WEB_DB_INSTANCE>
    </ASSET>
    <STIGS>
        <iSTIG>
            <STIG_INFO>
                <SI_DATA>
                    <SID_NAME>version</SID_NAME>
                    <SID_DATA>5</SID_DATA>
                </SI_DATA>
                <SI_DATA>
                    <SID_NAME>classification</SID_NAME>
                    <SID_DATA>UNCLASSIFIED</SID_DATA>
                </SI_DATA>
                <SI_DATA>
                    <SID_NAME>customname</SID_NAME>
                </SI_DATA>
                <SI_DATA>
                    <SID_NAME>stigid</SID_NAME>
                    <SID_DATA>McAfee_VirusScan88_Managed_Client</SID_DATA>
                </SI_DATA>
                <SI_DATA>
                    <SID_NAME>description</SID_NAME>
                    <SID_DATA>The McAfee VirusScan Managed Client STIG is published as a tool to improve the security of Department of Defense (DoD) information systems. The requirements are derived from the NIST 800-53 and related documents. Comments or proposed revisions to this document should be sent via e-mail to the following address: disa.stig_spt@mail.mil.</SID_DATA>
                </SI_DATA>
                <SI_DATA>
                    <SID_NAME>filename</SID_NAME>
                    <SID_DATA>U_McAfee_VirusScan88_Managed_Client_STIG_V5R21_Manual-xccdf.xml</SID_DATA>
                </SI_DATA>
                <SI_DATA>
                    <SID_NAME>releaseinfo</SID_NAME>
                    <SID_DATA>Release: 21 Benchmark Date: 25 Oct 2019</SID_DATA>
                </SI_DATA>
                <SI_DATA>
                    <SID_NAME>title</SID_NAME>
                    <SID_DATA>McAfee VirusScan 8.8 Managed Client STIG</SID_DATA>
                </SI_DATA>
                <SI_DATA>
                    <SID_NAME>uuid</SID_NAME>
                    <SID_DATA>1a441b95-b269-4423-8a40-a34f56441f5a</SID_DATA>
                </SI_DATA>
                <SI_DATA>
                    <SID_NAME>notice</SID_NAME>
                    <SID_DATA>terms-of-use</SID_DATA>
                </SI_DATA>
                <SI_DATA>
                    <SID_NAME>source</SID_NAME>
                </SI_DATA>
            </STIG_INFO>
            <VULN>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Vuln_Num</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>V-6453</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Severity</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>high</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Group_Title</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>DTAM001-McAfee VirusScan Control Panel </ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Rule_ID</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>SV-55134r1_rule</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Rule_Ver</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>DTAM001</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Rule_Title</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>McAfee VirusScan On-Access General Policies must be configured to enable on-access scanning at system startup.
</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Vuln_Discuss</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>For antivirus software to be effective, it must be running at all times, beginning from the point of the system's initial startup. Otherwise, the risk is greater for viruses, trojans, and other malware infecting the system during that startup phase.
</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>IA_Controls</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA></ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Check_Content</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>From the ePO server console System Tree, select the Systems tab, select the asset to be checked, select Actions, select Agent, and select Modify Policies on a Single System. From the product pull down list, select VirusScan Enterprise 8.8.0. Select from the Policy column the policy associated with the On-Access General Policies. Under the General tab, locate the "Enable on-access scanning:" label. Ensure the "Enable on-access scanning at system startup" option is selected.

Criteria:  If the "Enable on-access scanning at startup" option is selected, this is not a finding. 

On the client machine, use the Windows Registry Editor to navigate to the following key:
HKLM\Software\McAfee\ (32-bit)
HKLM\Software\Wow6432Node\McAfee\ (64-bit)
SystemCore\VSCore\On Access Scanner\McShield\Configuration

Criteria:  If the value of bStartDisabled is 0, this is not a finding. If the value is 1, this is a finding.</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Fix_Text</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>From the ePO server console System Tree, select the Systems tab, select the asset to be checked, select Actions, select Agent, and select Modify Policies on a Single System. From the product pull down list, select VirusScan Enterprise 8.8.0. Select from the Policy column the policy associated with the On-Access General Policies. Under the General tab, locate the "Enable on-access scanning:" label. Select the "Enable on-access scanning at system startup" option. Select Save.</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>False_Positives</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA></ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>False_Negatives</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA></ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Documentable</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>false</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Mitigations</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA></ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Potential_Impact</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA></ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Third_Party_Tools</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA></ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Mitigation_Control</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA></ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Responsibility</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>System Administrator</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Security_Override_Guidance</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA></ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Check_Content_Ref</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>M</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Weight</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>10.0</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>Class</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>Unclass</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>STIGRef</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>McAfee VirusScan 8.8 Managed Client STIG :: Version 5, Release: 21 Benchmark Date: 25 Oct 2019</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>TargetKey</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>2266</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STIG_DATA>
                    <VULN_ATTRIBUTE>CCI_REF</VULN_ATTRIBUTE>
                    <ATTRIBUTE_DATA>CCI-001242</ATTRIBUTE_DATA>
                </STIG_DATA>
                <STATUS>Not_Reviewed</STATUS>
                <FINDING_DETAILS></FINDING_DETAILS>
                <COMMENTS></COMMENTS>
                <SEVERITY_OVERRIDE></SEVERITY_OVERRIDE>
                <SEVERITY_JUSTIFICATION></SEVERITY_JUSTIFICATION>
            </VULN>
        </iSTIG>
    </STIGS>
</CHECKLIST>

I know there are a couple different ways to do this, but I was trying trough minidom first

import xml.dom.minidom
doc = xml.dom.minidom.parse(r'C:\Temp\test.xml')
print (doc.nodeName)
root = doc.firstChild.tagName
root

Which results in printing out CHECKLIST which is indeed the root of the document. Now in powershell, I would do root.STIG.iSTIG.STIG_INFO.SI_DATA and start a loop through there but having trouble wrapping my head around why this is so much more different.

I also tried to begin with ElementTree but didn't get far

from xml.etree import ElementTree as ET
doc = ET.parse(r'C:\Temp\test.xml').getroot()

Can anyone point me in the right direction here without necessarily giving me the written code as an answer? I already transformed my XML using lxml and was able to output the below file which is great but having trouble with the next step.

Thanks!

Since you are looking for a general direction, try something like this and modify it to your needs:

from lxml import etree

stig = """your xml above"""
parser = etree.XMLParser()

tree = etree.fromstring(stig, parser)
items = tree.xpath('//iSTIG/STIG_INFO//SI_DATA')
for item in items:
    print(item.xpath('string(SID_NAME/text())')," ",item.xpath('string(SID_DATA/text())'))

Output:

version   5
classification   UNCLASSIFIED

etc.

Obviously, instead of printing you can add each item to a list and so on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM