简体   繁体   中英

DOM Parser receiving NullPointerException on pure HTML RSS post

I'm going to try make this as clear as possible, although I'm not sure I'll succeed.

I've implemented a DOM parser in Android to parse a typical RSS feed based off some of the code found here . It works fine for almost all of the feeds I've tried however I just ran into a NullPointerException on the line theString = nchild.item(j).getFirstChild().getNodeValue(); (my code is lower down) on a certain post on a certain feed from a Blogger site. I know it's only this post because I rewrote the loop to ignore this single post and the error didn't appear and parsing continued just fine. Upon looking at this post within the actual RSS feed, it seems this post is entirely written in HTML (as opposed to just standard text) whereas the other posts which succeeded aren't.

Would this be the cause of the issue, or should I keep looking? And if this is indeed the issue, how would I go about solving it? Is there a way to ignore posts which are written in this way? I've tried looking for alternative examples to compare and try, but it seems that everyone has used the same base code for their tutorials.

The post I'm referring to is just a link, and a couple of lines of coloured text within <div> tags with some different fonts. I'd post it here, but I'm not sure the owner of the feed would want me to (I'll ask and update if able).

My parser:

try {
        // Create required instances
        DocumentBuilderFactory dbf;
        dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();

        // Parse the xml
        Document doc = db.parse(new InputSource(url.openStream()));
        doc.getDocumentElement().normalize();

        // Get all <item> tags.
        NodeList nl = doc.getElementsByTagName("item");
        int length = nl.getLength();

        for (int i = 0; i < length; i++) {
            Node currentNode = nl.item(i);
            RSSItem _item = new RSSItem();

            NodeList nchild = currentNode.getChildNodes();
            int clength = nchild.getLength();

            for (int j = 1; j < clength; j = j + 2) {

                Node thisNode = nchild.item(j);
                String theString = null;
                String nodeName = thisNode.getNodeName();

                theString = nchild.item(j).getFirstChild().getNodeValue();
                if (theString != null) {
                    if ("title".equals(nodeName)) {
                        _item.setTitle(theString);
                    } else if ("description".equals(nodeName)) {
                        _item.setDescription(theString);
                    } else if ("pubDate".equals(nodeName)) {
                        String formatedDate = theString.replace(" +0000", "");
                        _item.setDate(formatedDate);
                    } else if ("author".equals(nodeName)) {
                        _item.setAuthor(theString);
                    }
                }
            }
            _feed.addItem(_item);
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    return _feed;
}

As I mentioned, I changed the text to ignore the (third) post causing the issue:

if(i != 3){
    if (theString != null) {
        if ("title".equals(nodeName)) {
            _item.setTitle(theString);
        } else if ("description".equals(nodeName)) {
            _item.setDescription(theString);
        } else if ("pubDate".equals(nodeName)) {
            String formatedDate = theString.replace(" +0000", "");
            _item.setDate(formatedDate);
        } else if ("author".equals(nodeName)) {
            _item.setAuthor(theString);
        }
    }
}

Which resulted in everything working as desired, just skipping the third post. Any help with this is appreciated, I've been searching for a while with no luck. I'd post my logcat but it's not very useful after the line I pasted at the start of this Q due to it going back through an AsyncTask.

Oh, and one of the ways I was thinking about solving it was just parse the description first instead of the title (rewriting the loop of course), and detecting if that was equal to NULL before continuing the parse. It'd be quite messy though, so I'm searching for an alternative.

Take a look at the HTML code you are trying to parse. I'm almost sure that the third post has no child. This is, it's empty . For example, this node would throw you an exception:

<Element></Element>

So, you must avoid calling getNodeValue before checking if the node has any childs:

theString = nchild.item(j).getFirstChild().getNodeValue();

To avoid this, you could make something like:

  if (nchild.item(j).getFirstChild() != null)
        //and your code
        //...

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM