Suppose I had the following tags in my XML file:
<?xml version="1.0" encoding="utf-8"?>
<jobs>
<job>
<P class="Beaton"><FONT size=3><SPAN style="FONT-FAMILY: Symbol; COLOR: black; mso-ascii-font-family: 'Times New Roman'">�</SPAN><SPAN style="COLOR: black"><FONT face="Times New Roman"><SPAN style="mso-spacerun: yes"> </SPAN>Position accountability<o:p></o:p></FONT></SPAN></FONT></P>
<P class="Beaton"><FONT size=3><SPAN style="FONT-FAMILY: Symbol; COLOR: black; mso-ascii-font-family: 'Times New Roman'">�</SPAN><SPAN style="COLOR: black"><FONT face="Times New Roman"> <SPAN style="mso-spacerun: yes"> </SPAN>55 FTEs <o:p></o:p></FONT></SPAN></FONT></P>
</job>
</jobs>
and below is my code:
from xml.sax.handler import ContentHandler
import xml.sax
xml_path = 'windows/xml_file.xml'
try:
parser = xml.sax.make_parser( )
parser.parse(open(xml_path))
except (xml.sax.SAXParseException), e:
print "*** PARSER error: %s" % e
Result :
*** PARSER error: windows/xml_file.xml:4:113: not well-formed <invalid token>
Can anyone tell me what's wrong in the p tag and how to avoid this kind of error?
The problem is probably with your FONT
tag; the value of the size
attribute should be quoted, otherwise this is simply not valid XML.
You might also run into problems with
, which is not a valid XML entity (although it is valid in XHTML). Also, your <jobs>
tag is not closed properly; the last line should be </jobs>
.
In general, if you have problems with reading XML files, the first thing is always to check whether the XML file is well-formatted. One possible way to do that is by entering it into the W3C validator .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.