简体   繁体   English

无法解析包含特殊字符的值? 使用sax解析器

[英]Unable to parse value containing special character? Using sax parser

I am new to parsing field. 我是解析字段的新手。 I'm trying to write a parser code but unable to get the value with respect to a particular tag that value contains ampersand(&) . 我正在尝试编写解析器代码,但无法获取值包含ampersand(&)的特定标记的值。 Please help me to get the solution. 请帮我解决问题。

My xml file looks like 我的xml文件看起来像

<system>
<u_id>10145</u_id>
<serial_no>1800015</serial_no>
<branch_name>B & P Infotech Ltd.</branch_name>
</system>

and I have tried with this java code, but it's not giving me proper output. 我尝试过这个java代码,但它没有给我正确的输出。

main class 主要班级

package com.satya.xmltest;

import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

public class SaxTest {

    public static void main(String[] args) {
        SAXParserFactory parserFactory = SAXParserFactory.newInstance();
        SaxtestHandler handler=new SaxtestHandler();
        try {
            SAXParser parser = parserFactory.newSAXParser();
            parser.parse("C:\\Users\\abc\\Desktop\\test.xml", handler);
        } catch (Exception e) {
        }
        SystemTo systemTo=handler.systemTo;
        System.out.println("Uid :"+systemTo.getUid());
        System.out.println("serial number :"+systemTo.getSerialNumber());
        System.out.println("name :"+systemTo.getName());
    }
}

Handler class 处理程序类

In this class the parsing is done and setting the data values to data container class. 在此类中,完成解析并将数据值设置为数据容器类。

package com.satya.xmltest;

import org.xml.sax.Attributes;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class SaxtestHandler extends DefaultHandler {
    String content = "";
    SystemTo systemTo=new SystemTo();

    @Override
    public void startElement(String uri, String localName, String qName,
        Attributes attributes) throws SAXException {

        switch (qName) {
            case "system":
                System.out.println("inside company");
                break;
        }
    }

    @Override
    public void endElement(String uri, String localName, String qName)
        throws SAXException {
        switch (qName) {
            case "u_id":
                systemTo.setUid(content);
                break;
            case "serial_no":
                systemTo.setSerialNumber(content);
                break;
            case "branch_name":
                systemTo.setName(content);
                break;
        }
    }

    @Override
    public void characters(char[] ch, int start, int length)
        throws SAXException {
        content = String.copyValueOf(ch, start, length).trim();
    }
}

Data container class 数据容器类

package com.satya.xmltest;

public class SystemTo {

    private String uid;
    private String serialNumber;
    private String name;
    public String getUid() {
        return uid;
    }
    public void setUid(String uid) {
        this.uid = uid;
    }
    public String getSerialNumber() {
        return serialNumber;
    }
    public void setSerialNumber(String serialNumber) {
        this.serialNumber = serialNumber;
    }
    public String getName() {
        return name;
    }
    public void setName(String name) {
        this.name = name;
    }
}

My output is: 我的输出是:

Uid: 10145
serial number: 1800015
name: null

But I need: 但是我需要:

Uid: 10145
serial number: 1800015
name: B & P Infotech Ltd.

Thanks in advance. 提前致谢。

There are some characters in XML that must not appear in their literal form in an XML document, except when used as markup delimiters or within a comment, a processing instruction, or a CDATA section. XML中的某些字符不能以XML文档的文字形式出现,除非用作标记分隔符或在注释,处理指令或CDATA部分中使用。
List of characters and their corresponding entity or the numeric reference to replace : 字符列表及其对应的实体或要替换的数字引用:

Original Character    XML entity replacement      XML numeric replacement

      "                     &quot;                       &#34;   
      <                     &lt;                         &#60;   
      >                     &gt;                         &#62;
      &                     &amp;                        &#38;
      '                     &apos;                       &#39;   

you must replace above character in XML before you parse it. 在解析之前,必须在XML中替换上面的字符。

You may use CDATA Section for text that is not markup constitutes the character data of the document 对于非标记的文本,您可以使用CDATA部分构成文档的字符数据

You can escape these chars like html does: 你可以像html一样逃避这些字符:

<branch_name>B &amp; P Infotech Ltd.</branch_name>

Or you have use of CDATA: 或者您使用CDATA:

<branch_name><![CDATA[B & P Infotech Ltd.]]></branch_name>

You must replace your special characters with the characters that are accepted for an XML file. 必须使用XML文件接受的字符替换特殊字符。 In your case & should be replaced by &amp; 在你的情况下,应该被&amp;取代

@Override
public void characters(char[] ch, int start, int length)
        throws SAXException {
    content = String.copyValueOf(ch, start, length).trim();
    content = content.replace("&", "&amp;")
}

The problem is that the "&" is an escape character it self. 问题是“&”是一个自我逃脱的角色。

To fix this you need to replace the ampersand with a unicode equivalent, ie: " &#038; " 要解决此问题,您需要使用等效的unicode替换&符号,即:“ &#038;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM