简体   繁体   中英

Trouble Using LXML ETREE to Parse XML Files on Local Machine With Python

I am using Python 2.7.3 on Mac OSX and have lxml version 3.3.3 installed. I have several xml files that are in the same directory, for instance, MyDir/file1.xml and MyDir/file2.xml . I am trying to bring each one into python and extract the relevant information. However, I can't seem to get the etree parser to work. My code is very simple:

 from lxml import etree
 from os import listdir
 from os.path import isfile, join

 xmlfiles = [x for x in listdir("MyDir") if isfile(join("MyDir",x))]

 for file in xmlfiles:

     doc = etree.parse(file)

         get the stuff I need

However, the parser keeps throwing me the following error

 Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
   File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src/lxml/lxml.etree.c:69955)
   File "parser.pxi", line 1748, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:102066)
   File "parser.pxi", line 1774, in lxml.etree._parseDocumentFromURL       
   (src/lxml/lxml.etree.c:102330)
   File "parser.pxi", line 1678, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:101365)
   File "parser.pxi", line 1110, in lxml.etree._BaseParser._parseDocFromFile 
   (src/lxml/lxml.etree.c:96817)
   File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc   
   (src/lxml/lxml.etree.c:91275)
   File "parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92461)
   File "parser.pxi", line 620, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:91722)
 IOError: Error reading file 'File1.xml': failed to load external entity     
 "File1.xml"

I've looked at several answers on here, but they are all for specific questions, mostly dealing with feeding the parser an html file whereas I'm just feeding it an xml file already stored on my local machine. Can anybody please help me figure out why this isn't working properly?

Also, is there a better way to parse and extract information from xml files using python then the approach I'm taking (assuming I get it to work!).

Thanks

I'd better use glob.iglob() with a *.xml file mask instead. This is more explicit and safe:

for filename in glob.iglob("MyDir/*.xml"):
    tree = etree.parse(filename)
    print tree.getroot()

Hope that helps.

You're not providing the full file path, hence the failure trying to load the file.

You have a few options:

  1. Change to MyDir from the shell before launching the script (fragile)
  2. Within in the script, change to MyDir before your for loop (eg import os;' os.chdir('MyDir') )
  3. Include the full path in your list comprehension, eg :

     xmlfiles = [join("MyDir",x) for x in listdir("MyDir") if isfile(join("MyDir",x))]
  4. Build the path in your for loop, eg:

     for file in xmlfiles: doc = etree.parse(join("MyDir",file)) #continue on

There are obviously other solutions, eg @alecxe's where he's using glob's iterator (which returns the file with the path, as opposed to just the file name that os.listdir() does).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM