简体   繁体   中英

python xml.sax error “not well-formed<invalid token>”

Suppose I had the following tags in my XML file:

<?xml version="1.0" encoding="utf-8"?>
<jobs>
<job>
<P class="Beaton"><FONT size=3><SPAN style="FONT-FAMILY: Symbol; COLOR: black; mso-ascii-font-family: 'Times New Roman'">�</SPAN><SPAN style="COLOR: black"><FONT face="Times New Roman"><SPAN style="mso-spacerun: yes">&nbsp; </SPAN>Position accountability<o:p></o:p></FONT></SPAN></FONT></P>
<P class="Beaton"><FONT size=3><SPAN style="FONT-FAMILY: Symbol; COLOR: black; mso-ascii-font-family: 'Times New Roman'">�</SPAN><SPAN style="COLOR: black"><FONT face="Times New Roman"> <SPAN style="mso-spacerun: yes">&nbsp;</SPAN>55 FTEs <o:p></o:p></FONT></SPAN></FONT></P>
</job>
</jobs>

and below is my code:

from xml.sax.handler import ContentHandler
import xml.sax

xml_path = 'windows/xml_file.xml'

try:
    parser = xml.sax.make_parser( )
    parser.parse(open(xml_path))

except (xml.sax.SAXParseException), e:
        print "*** PARSER error: %s" % e

Result :
*** PARSER error: windows/xml_file.xml:4:113: not well-formed <invalid token>

Can anyone tell me what's wrong in the p tag and how to avoid this kind of error?

The problem is probably with your FONT tag; the value of the size attribute should be quoted, otherwise this is simply not valid XML.

You might also run into problems with &nbsp; , which is not a valid XML entity (although it is valid in XHTML). Also, your <jobs> tag is not closed properly; the last line should be </jobs> .

In general, if you have problems with reading XML files, the first thing is always to check whether the XML file is well-formatted. One possible way to do that is by entering it into the W3C validator .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM