简体   繁体   中英

How to read escape characters as text in Java?

public List<String> readRSS(String feedUrl, String openTag, String closeTag)
            throws IOException, MalformedURLException {

        URL url = new URL(feedUrl);
        BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));

        String currentLine;
        List<String> tempList = new ArrayList<String>();
        while ((currentLine = reader.readLine()) != null) {
            Integer tagEndIndex = 0;
            Integer tagStartIndex = 0;
            while (tagStartIndex >= 0) {
                tagStartIndex = currentLine.indexOf(openTag, tagEndIndex);
                if (tagStartIndex >= 0) {
                    tagEndIndex = currentLine.indexOf(closeTag, tagStartIndex);
                    tempList.add(currentLine.substring(tagStartIndex + openTag.length(), tagEndIndex) + "\n");
                }
            }
        }
        if (tempList.size() > 0) {
            if(openTag.contains("title")){
                tempList.remove(0);
                tempList.remove(0);
            }
            else if(openTag.contains("desc")){
                tempList.remove(0);
            }
        }
        return tempList;
    }

I wrote this code to read an RSS feed. It all works fine but when the parser finds a char like this &#xD; it breaks. This is because it can't find its ending tags becuase the xml is escaped.

I don't know how I can fix it inside my code. Could anyone help me fixing this issue?

The problem is that the special character &#xD; is a line break so your start and end tags wind up on different lines. So, if you are reading line by line it will not work with the code that you have.

You can try something like this:

StringBuffer fullLine = new StringBuffer();

while ((currentLine = reader.readLine()) != null) {
    int tagStartIndex = currentLine.indexOf(openTag, 0);
    int tagEndIndex = currentLine.indexOf(closeTag, tagStartIndex);

    // both tags on the same line
    if (tagStartIndex != -1 && tagEndIndex != -1) {
        // process the whole line
        tempList.add(currentLine);
        fullLine = new StringBuffer();
    // no tags on this line but the buffer has been started
    } else if (tagStartIndex == -1 && tagEndIndex == -1 && fullLine.length() > 0) {
        /*
         * add the current line to the buffer; it is part 
         * of a larger line
         */
        fullLine.append(currentLine);
    // start tag is on this line
    } else if (tagStartIndex != -1 && tagEndIndex == -1) {
        /*
         *  line started but did not have an end tag; add it to 
         *  a new buffer
         */
        fullLine = new StringBuffer(currentLine);
        // end tag is on this line
    } else if (tagEndIndex != -1 && tagStartIndex == -1) {
        /*
         *  line ended but did not have a start tag; add it to 
         *  the current buffer and then process the buffer
         */
        fullLine.append(currentLine);
        tempList.add(fullLine.toString());
        fullLine = new StringBuffer();
    }
}

Given this sample input:

<title>another &#xD;
title 0</title>
<title>another title 1</title>
<title>another title 2</title>
<title>another title 3</title>
<desc>description 0</desc>
<desc>another &#xD;
description 1</desc>
<title>another title 4</title>
<title>another &#xD;
another line in between &#xD;
title 5</title>

The full lines in the tempList for title become:

<title>another &#xD;title 0</title>
<title>another title 1</title>
<title>another title 2</title>
<title>another title 3</title>
<title>another title 4</title>
<title>another &#xD;another line in between &#xD;title 5</title>

And for desc :

<desc>description 0</desc>
<desc>another &#xD;description 1</desc>

You should test this approach for performance on your full RSS feed. And also note that the special characters will not be escaped.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM