简体   繁体   中英

Parsing xml special chars issue

I'm parsing an XML got from webservice using SAX .

One of the fields is a link, like the following

<link_site>
   http://www.ownhosting.com/webservice_332.asp?id_user=21395&amp;id_parent=33943
</link_site>

I have to get this link and save it, but it is saved like so: id_parent=33943 .

Parser snippet:

//inside method startElement
else if(localName.equals("link_site")){
    this.in_link=true;
}
...
//inside method endElement
else if(localName.equals("link_site"){
     this.in_link=false;
}

Then, I get the content

else if(this.in_link){
    xmlparsing.setOrderLink(count, Html.fromHtml(new String(ch, start, length)).toString());
}//I get it and put in a HashMap<Integer,String>

I know that this issue is due to the special characters encoding.

What can I do?

& makes parser to split the line and make several calls to characters() method. You need to concatinate the chunks. Something like this

    SAXParserFactory.newInstance().newSAXParser()
            .parse(new File("1.xml"), new DefaultHandler() {
                String url;
                String element;

                @Override
                public void startElement(String uri, String localName, String qName,
                        Attributes attributes) throws SAXException {
                    element = qName;
                    url = "";
                }

                @Override
                public void characters(char[] ch, int start, int length) throws SAXException {
                    if (element.equals("link_site")) {
                        url += new String(ch, start, length); 
                    }
                }

                @Override
                public void endElement(String uri, String localName, String qName)
                        throws SAXException {
                    if (element.equals("link_site")) {
                        System.out.println(url.trim());
                        element = "";
                    }
                }
            });

prints

http://www.ownhosting.com/webservice_332.asp?id_user=21395&id_parent=33943

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM