简体   繁体   中英

Parsing XML using xml.etree in python throws TypeError

I am writing a piece of code that extracts data from a bunch of XML documents.

The code works as intended on the files individually; however, when I iterate over the files I get a wierd error.

The code goes as follows:

import xml.etree.ElementTree as ET
import os

for root,dirs,files in os.walk(path):
    for file in files:
        if file.endswith(".xml"):
            tree = ET.parse(os.path.join(root,file))
            root = tree.getroot()

When I execute the code, the following error appears:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-85cdfa81e486> in <module>()
      4     for file in files:
      5         if file.endswith(".xml"):
----> 6             tree = ET.parse(os.path.join(root,file))
      7             root = tree.getroot()

~/.pyenv/versions/3.6.0/lib/python3.6/posixpath.py in join(a, *p)
     76     will be discarded.  An empty last part will result in a path that
     77     ends with a separator."""
---> 78     a = os.fspath(a)
     79     sep = _get_sep(a)
     80     path = a

TypeError: expected str, bytes or os.PathLike object, not xml.etree.ElementTree.Element 

If I remove de last line root = tree.getroot() then everything starts working again. I don't have the slightest idea of what's happening.

You're using same name (root) for 2 different variables in your code (for looping through your path, and another for getting xml's root):

tree = ET.parse(os.path.join(root,file)) #root for your path/folder structure
root = tree.getroot() #root for your xml tree - should use different name

Use different variable name for one of them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM