简体   繁体   中英

Python XML parsing with ElementTree returns None

I'm trying to parse this xml string using ElementTree in Python,

the data stored as a string,

xml = '''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
    <RollNumber>1</RollNumber>
    <Name>Abel</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>abel@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>2</RollNumber>
    <Name>Joseph</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>joseph@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>3</RollNumber>
    <Name>Mike</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>mike@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
</SearchResults>'''

The code I used to parse this string as xml,

from xml.etree import ElementTree

xml = ElementTree.fromstring(xml)

results = xml.findall('Student')

for students in results:
    for student in students:
        print student.get('Name')

print results prints out the results as Elements,

[<Element 'Student' at 0x7feb615b4ad0>, <Element 'Student' at 0x7feb615b4c50>, <Element 'Student' at 0x7feb615b4e10>]

inside the for loop, print students prints out the same,

<Element 'Student' at 0x7fd722d88ad0>
<Element 'Student' at 0x7fd722d88c50>
<Element 'Student' at 0x7fd722d88e10>

Anyway when I try to get the Name of the student using the print student.get('Name') , the program returns None.

What I'm trying to do is to pull the values from the xml for each tags and construct a dict.

You have a double loop here:

for students in results:
    for student in students:
        print student.get('Name')

students is one <Student> element . By iterating over that you get individual elements contained in that element. Those contained elements ( <RollNumber> , <Name> , etc) have no Name attribute.

The .get() method only access attributes, but you appear to want to get the <Name> element. Use .find() or an XPath expression here instead:

for student in results:
    name = student.find('Name')
    if name is not None:
        print name.text

or

for student_name in xml.findall('.//Student/Name'):
    print name.text

If you're new to XML processing:

  • lxml is fast and powerful library for interacting with XML in python. The standard library doesn't have full xpath support.
  • xpath is a query language for examining XML documents, it has a steep learning curve, but it's easy to get help with on StackOverflow. xpath is so useful that I've started casting JSON to XML when using APIs just so that I can write xpath queries instead of crazy nested dictionary dereferencing.

from lxml import etree
from pprint import pprint

doc = etree.XML('''<?xml version="1.0" encoding="utf-8"?>
<SearchResults xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<Student>
    <RollNumber>1</RollNumber>
    <Name>Abel</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>abel@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>2</RollNumber>
    <Name>Joseph</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>joseph@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
<Student>
    <RollNumber>3</RollNumber>
    <Name>Mike</Name>
    <PhoneNumber>Not Included</PhoneNumber>
    <Email>mike@hisschool.edu</Email>
    <Grade>7</Grade>
</Student>
</SearchResults>''')

def first(seq,default=None):
  for item in seq:
    return item
  return default

def simple_children_to_dict(element):
  result = {}
  for child in element:
    result[child.tag] = child.text
  return result

def get_by_rollnumber(number,search_results):
  student_element = first(search_results.xpath('Student[./RollNumber=$number]',number=number))
  if student_element is None:
    raise Exception("Student Number {0} not found".format(number))
  return simple_children_to_dict(student_element)  

def get_all_students(search_results):
  students = []
  for student_element in doc.xpath('Student'):
    students.append(simple_children_to_dict(student_element))
  return students

Then:

>>> pprint(get_by_rollnumber(2,doc))
{'Email': 'joseph@hisschool.edu',
 'Grade': '7',
 'Name': 'Joseph',
 'PhoneNumber': 'Not Included',
 'RollNumber': '2'}
>>>
>>> pprint(get_all_students(doc))
[{'Email': 'abel@hisschool.edu',
  'Grade': '7',
  'Name': 'Abel',
  'PhoneNumber': 'Not Included',
  'RollNumber': '1'},
 {'Email': 'joseph@hisschool.edu',
  'Grade': '7',
  'Name': 'Joseph',
  'PhoneNumber': 'Not Included',
  'RollNumber': '2'},
 {'Email': 'mike@hisschool.edu',
  'Grade': '7',
  'Name': 'Mike',
  'PhoneNumber': 'Not Included',
  'RollNumber': '3'}]

Subtleties:

  • xpath queries usually returns a result set because most queries could have more than one match. Hence the use of a helper first function.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM