简体   繁体   English

当节点内部文本是html时,Java解析xml文件

[英]Java parse xml file when node inner text is html

Right now I'm using SAXParser with my own handler, it can parse all node values except for the one that has type="html" 现在我正在使用SAXParser和我自己的处理程序,它可以解析除了type =“html”之外的所有节点值

My characters function is like this: 我的角色功能是这样的:

public void characters(char ch[], int start, int length) throws SAXException {
        if(content){
        String tmp = new String(ch, start, length);
        System.out.println("Content : " + tmp);
        content = false;
        }

And that particular node has the following format, which my output always just give me a bunch of \\n and nothing else. 并且该特定节点具有以下格式,我的输出总是只给我一堆\\ n而不是别的。

   <content type="html">

    &lt;img alt="" src="http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png" /&gt;


     &lt;p&gt;Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (&lt;i&gt;Lost&lt;/i&gt;, &lt;i&gt;Fringe&lt;/i&gt;, &lt;i&gt;Star Trek: Into Darkness&lt;/i&gt;, &lt;i&gt;Alias&lt;/i&gt;,&amp;nbsp;etc.), has released a&amp;nbsp;&lt;a href="http://youtu.be/FWaAZCaQXdo" target="_blank"&gt;mysterious new trailer&lt;/a&gt; titled "Stranger." The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. "Men are erased and reborn," intones a narrator that sounds a little like Leonard Nimoy.&lt;/p&gt;
     &lt;p&gt;&lt;/p&gt;



    </content>

You should use StringBuffer to store content as it's described in these topics: 您应该使用StringBuffer来存储内容,如以下主题中所述:

SAX parsing and special characters SAX解析和特殊字符

Unable to read special characters from xml using java 无法使用java从xml中读取特殊字符

You might be wrongfully assuming that the characters callback occurs only once in between startElement and endElement callbacks. 您可能错误地认为characters回调仅在startElementendElement回调之间发生一次。 It is actually called multiple times. 它实际上被多次调用。

Since you use the content boolean member to determine whether to print stuff or not and also set this same member to false inside characters callback, your condition is bound to be fulfilled only once, until you reset content (it is not clear where you do that). 由于您使用content布尔成员来确定是否打印东西,并在characters回调中将此同一成员设置为false ,因此您的条件必须只满足一次,直到重置content为止(不清楚您在何处执行此操作) )。

Here's an example that works with your XML just fine (assumes non-mixed content and Java programming language): 这是一个适合您的XML的示例(假设非混合内容和Java编程语言):

import java.io.IOException;
import java.io.StringReader;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class TestSaxParser {

    public static void main(String[] args) throws ParserConfigurationException, SAXException, IOException {
        String xml = 
            "<content type=\"html\">\n" +
            "\n" +
            "    &lt;img alt=\"\" src=\"http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png\" /&gt;\n" +
            "\n" +
            "\n" +
            "     &lt;p&gt;Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (&lt;i&gt;Lost&lt;/i&gt;, &lt;i&gt;Fringe&lt;/i&gt;, &lt;i&gt;Star Trek: Into Darkness&lt;/i&gt;, &lt;i&gt;Alias&lt;/i&gt;,&amp;nbsp;etc.), has released a&amp;nbsp;&lt;a href=\"http://youtu.be/FWaAZCaQXdo\" target=\"_blank\"&gt;mysterious new trailer&lt;/a&gt; titled \"Stranger.\" The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. \"Men are erased and reborn,\" intones a narrator that sounds a little like Leonard Nimoy.&lt;/p&gt;\n" +
            "     &lt;p&gt;&lt;/p&gt;\n" +
            "\n" +
            "\n" +
            "\n" +
            "    </content>";

        MySaxHandler handler = new MySaxHandler();
        SAXParserFactory factory = SAXParserFactory.newInstance();
        SAXParser parser = factory.newSAXParser();        
        InputSource source = new InputSource(new StringReader(xml));
        parser.parse(source, handler);
    }

    private static class MySaxHandler extends DefaultHandler {
        private StringBuilder content = new StringBuilder();

        @Override
        public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException {
            content.setLength(0);
        }

        @Override
        public void characters(char[] ch, int start, int length) throws SAXException {
            content.append(ch, start, length);
        }

        @Override
        public void endElement(String uri, String localName, String qName) throws SAXException {
            System.out.println(content.toString());
        }

    }    
}

Output: 输出:

    <img alt="" src="http://cdn2.sbnation.com/entry_photo_images/8767829/stranger-bad-robot-screencap_large.png" />


     <p>Bad Robot, the production company founded by geek culture hitmaker J.J. Abrams (<i>Lost</i>, <i>Fringe</i>, <i>Star Trek: Into Darkness</i>, <i>Alias</i>,&nbsp;etc.), has released a&nbsp;<a href="http://youtu.be/FWaAZCaQXdo" target="_blank">mysterious new trailer</a> titled "Stranger." The creepy and inscrutable video spot, posted by the official Bad Robot Twitter account this afternoon, features a starry sky; a long-haired, rope-bound man wandering along a desolate monochromatic shore line; and your garden variety, horrifying stitched-mouth person coming into focus. "Men are erased and reborn," intones a narrator that sounds a little like Leonard Nimoy.</p>
     <p></p>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM