简体   繁体   English

在Java中使用SAX解析器进行xml解析

[英]xml parsing using SAX parser in java

I am trying to parse rss xml , but stuck in parsing the description, as my program stops parsing the description content when it encounter ( ' ). 我正在尝试解析rss xml ,但是由于只能在遇到描述( ' )时停止解析描述内容,因此无法解析描述。

Code to parse xml: 解析xml的代码:

public class RSSAX {

String channel_title="";

public void displayRSS()
{

    try {

        SAXParserFactory spf =  SAXParserFactory.newInstance();
        SAXParser sp = spf.newSAXParser();
        sp.parse("http://www.ronkaplansbaseballbookshelf.com/feed/podcast/", new RSSHandler());


    } catch (Exception e) {
        // TODO: handle exception
        System.out.println("Messge is "+e.getMessage());
    }

}

private class RSSHandler extends DefaultHandler
{
    private boolean isItem = false;
    private String tagName=""; 

    @Override
    public void startElement(String uri, String localName, String qName,
            Attributes attributes) throws SAXException {
        this.tagName= qName;
        if(qName.equals("item"))
        {
            this.isItem=true;
        }

    }

    @Override
    public void endElement(String uri, String localName, String qName)
            throws SAXException {
         this.tagName="";
         if(qName.equals("item"))
         {
             System.out.println("========================");
             this.isItem=false;
         }


    }

    @Override
    public void characters(char[] ch, int start, int length)
            throws SAXException {

        if(this.isItem)
        {
            //System.out.println("tagname is "+this.tagName);
            if(this.tagName.equals("title"))
            {
                System.out.println("title is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("link"))
            {
                System.out.println("link is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("description"))
            {
                String test=(new String(ch,start,length)).replaceAll("\\<.*?>","");
                test=StringEscapeUtils.escapeXml(StringEscapeUtils.unescapeXml(test));
                System.out.println("description is "+test);
                this.tagName="";
            }
            else if(this.tagName.equals("comments"))
            {
                System.out.println("comment link is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("pubDate"))
            {
                System.out.println("pubDate is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("category"))
            {
                System.out.println("Category is "+(new String(ch,start,length)));
                this.tagName="";
            }
            else if(this.tagName.equals("content:encoded"))
            {
                System.out.println("content:encoded is "+(new String(ch,start,length)));
                //this.tagName="";
            }

        }

    }

}



Output: 输出:

title is The Bookshelf Conversation: Filip Bondy 标题是书架对话:菲利普·邦迪(Filip Bondy)
link is http://www.ronkaplansbaseballbookshelf.com/2015/08/04/the-bookshelf-conversation-filip-bondy/ 链接是http://www.ronkaplansbaseballbookshelf.com/2015/08/04/the-bookshelf-conversation-filip-bondy/
pubDate is Tue, 04 Aug 2015 14:31:45 +0000 pubDate是星期二,2015年8月4日14:31:45 +0000
comment link is http://www.ronkaplansbaseballbookshelf.com/2015/08/04/the-bookshelf-conversation-filip-bondy/#comments 评论链接是http://www.ronkaplansbaseballbookshelf.com/2015/08/04/the-bookshelf-conversation-filip-bondy/#comments
Category is 2015 title Category is Author profile/interview by Ron Kaplan 类别是2015年标题类别是作者简介/采访Ron Kaplan

description is My New Jersey landsman and veteran sportswriter Filip Bondy has crafted a fun volume on one of the most famous games in the history of the national pastime. 描述是《我的新泽西州》土地服务员和资深体育作家菲利普·邦迪(Filip Bondy)编写了一部有趣的书,其内容是全美娱乐史上最著名的游戏之一。 Whenever there 每当那里

It stops parsing the description when it encounters there's .. 它停止解析说明,当它遇到 ..

A SAX parser can break up text nodes any way it likes, and deliver the content in multiple calls to the characters() method. SAX解析器可以按自己喜欢的任何方式分解文本节点,并通过对character()方法的多次调用来传递内容。 It's your job to reassemble the pieces. 重新组装零件是您的工作。

You can use STAXParser , in this to force XMLStreamReader to return a single string, you can include: 您可以使用STAXParser ,在其中强制XMLStreamReader返回单个字符串,您可以包括:

factory.setProperty("javax.xml.stream.isCoalescing", true);

This helps to return as one string, refer XMLStreamReade.next() Documentation 这有助于作为一个字符串返回,请参考XMLStreamReade.next()文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM