I am using Python 2.7.3 on Mac OSX and have lxml version 3.3.3 installed. I have several xml files that are in the same directory, for instance, MyDir/file1.xml
and MyDir/file2.xml
. I am trying to bring each one into python and extract the relevant information. However, I can't seem to get the etree
parser to work. My code is very simple:
from lxml import etree
from os import listdir
from os.path import isfile, join
xmlfiles = [x for x in listdir("MyDir") if isfile(join("MyDir",x))]
for file in xmlfiles:
doc = etree.parse(file)
get the stuff I need
However, the parser keeps throwing me the following error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "lxml.etree.pyx", line 3239, in lxml.etree.parse (src/lxml/lxml.etree.c:69955)
File "parser.pxi", line 1748, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:102066)
File "parser.pxi", line 1774, in lxml.etree._parseDocumentFromURL
(src/lxml/lxml.etree.c:102330)
File "parser.pxi", line 1678, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:101365)
File "parser.pxi", line 1110, in lxml.etree._BaseParser._parseDocFromFile
(src/lxml/lxml.etree.c:96817)
File "parser.pxi", line 582, in lxml.etree._ParserContext._handleParseResultDoc
(src/lxml/lxml.etree.c:91275)
File "parser.pxi", line 683, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:92461)
File "parser.pxi", line 620, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:91722)
IOError: Error reading file 'File1.xml': failed to load external entity
"File1.xml"
I've looked at several answers on here, but they are all for specific questions, mostly dealing with feeding the parser an html file whereas I'm just feeding it an xml file already stored on my local machine. Can anybody please help me figure out why this isn't working properly?
Also, is there a better way to parse and extract information from xml files using python then the approach I'm taking (assuming I get it to work!).
Thanks
I'd better use glob.iglob() with a *.xml
file mask instead. This is more explicit and safe:
for filename in glob.iglob("MyDir/*.xml"):
tree = etree.parse(filename)
print tree.getroot()
Hope that helps.
You're not providing the full file path, hence the failure trying to load the file.
You have a few options:
MyDir
from the shell before launching the script (fragile)MyDir
before your for
loop (eg import os;' os.chdir('MyDir')
)Include the full path in your list comprehension, eg :
xmlfiles = [join("MyDir",x) for x in listdir("MyDir") if isfile(join("MyDir",x))]
Build the path in your for loop, eg:
for file in xmlfiles: doc = etree.parse(join("MyDir",file)) #continue on
There are obviously other solutions, eg @alecxe's where he's using glob's iterator (which returns the file with the path, as opposed to just the file name that os.listdir() does).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.