简体   繁体   中英

Using Element.tree how do I parse a tag within a tag?

I am new to coding and I'm trying to parse the following fields out of this entry:

Name,Category,Risk,Member

I've seem to write code to get me 3/4 fields but for some reason when I try to get text from the "member" field I get an error message please tell me what i am doing wrong again i am new so if you have an easier way of doing this im open to suggestions.

<application>
  <entry id="120" name="100bao" ori_country="USA" ori_language="English">
   <category>general-internet</category>
   <subcategory>file-sharing</subcategory>
   <technology>peer-to-peer</technology>
   <evasive-behavior>yes</evasive-behavior>
   <consume-big-bandwidth>yes</consume-big-bandwidth>
   <used-by-malware>yes</used-by-malware>
   <able-to-transfer-file>yes</able-to-transfer-file>
   <has-known-vulnerability>yes</has-known-vulnerability>
   <tunnel-other-application>no</tunnel-other-application>
   <prone-to-misuse>yes</prone-to-misuse>
   <pervasive-use>yes</pervasive-use>
   <risk>5</risk>
   <references>
     <entry name="www.100bao.com">
      <link>http://www.100bao.com/</link>
     </entry>
   </references>
   <per-direction-regex>no</per-direction-regex>
   <appident>yes</appident>
   <default>
     <port>
       <member>tcp/3468,6346,11300</member>
     </port>
   </default>
 </entry>
import xml.etree.ElementTree as ET

mytree = ET.parse('C:/Documents/Parse Folder/apps.xml')
root = mytree.getroot()

for entry in root.findall('entry'):
    category = entry.find('category').text
    risk = entry.find('risk').text
    member = entry.find('member').text

print(entry.attrib, category, risk, member)

member = entry.find('member').text
AttributeError: 'NoneType' object has no     attribute 'text' 

It's because member isn't a child of entry so you need to supply an XPath .

member = entry.find('./default/port/member').text

(Untested since the code in your question is untestable as-is.)

UPDATED WITH TESTED CODE

apps.xml (updated to be well-formed)

<application>
    <entry id="120" name="100bao" ori_country="USA" ori_language="English">
        <category>general-internet</category>
        <subcategory>file-sharing</subcategory>
        <technology>peer-to-peer</technology>
        <evasive-behavior>yes</evasive-behavior>
        <consume-big-bandwidth>yes</consume-big-bandwidth>
        <used-by-malware>yes</used-by-malware>
        <able-to-transfer-file>yes</able-to-transfer-file>
        <has-known-vulnerability>yes</has-known-vulnerability>
        <tunnel-other-application>no</tunnel-other-application>
        <prone-to-misuse>yes</prone-to-misuse>
        <pervasive-use>yes</pervasive-use>
        <risk>5</risk>
        <references>
            <entry name="www.100bao.com">
                <link>http://www.100bao.com/</link>
            </entry>
        </references>
        <per-direction-regex>no</per-direction-regex>
        <appident>yes</appident>
        <default>
            <port>
                <member>tcp/3468,6346,11300</member>
            </port>
        </default>
    </entry>
</application>

Python

import xml.etree.ElementTree as ET

mytree = ET.parse('apps.xml')
root = mytree.getroot()

for entry in root.findall('entry'):
    name = entry.get('name')
    category = entry.find('category').text
    risk = entry.find('risk').text
    member = entry.find('default/port/member').text

    print(f'Name: "{name}"\nCategory: "{category}"\nRisk: "{risk}"\nMember: "{member}"')

Output

Name: "100bao"
Category: "general-internet"
Risk: "5"
Member: "tcp/3468,6346,11300"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM