public List<String> readRSS(String feedUrl, String openTag, String closeTag)
throws IOException, MalformedURLException {
URL url = new URL(feedUrl);
BufferedReader reader = new BufferedReader(new InputStreamReader(url.openStream()));
String currentLine;
List<String> tempList = new ArrayList<String>();
while ((currentLine = reader.readLine()) != null) {
Integer tagEndIndex = 0;
Integer tagStartIndex = 0;
while (tagStartIndex >= 0) {
tagStartIndex = currentLine.indexOf(openTag, tagEndIndex);
if (tagStartIndex >= 0) {
tagEndIndex = currentLine.indexOf(closeTag, tagStartIndex);
tempList.add(currentLine.substring(tagStartIndex + openTag.length(), tagEndIndex) + "\n");
}
}
}
if (tempList.size() > 0) {
if(openTag.contains("title")){
tempList.remove(0);
tempList.remove(0);
}
else if(openTag.contains("desc")){
tempList.remove(0);
}
}
return tempList;
}
I wrote this code to read an RSS feed. It all works fine but when the parser finds a char like this 
it breaks. This is because it can't find its ending tags becuase the xml is escaped.
I don't know how I can fix it inside my code. Could anyone help me fixing this issue?
The problem is that the special character 
is a line break so your start and end tags wind up on different lines. So, if you are reading line by line it will not work with the code that you have.
You can try something like this:
StringBuffer fullLine = new StringBuffer();
while ((currentLine = reader.readLine()) != null) {
int tagStartIndex = currentLine.indexOf(openTag, 0);
int tagEndIndex = currentLine.indexOf(closeTag, tagStartIndex);
// both tags on the same line
if (tagStartIndex != -1 && tagEndIndex != -1) {
// process the whole line
tempList.add(currentLine);
fullLine = new StringBuffer();
// no tags on this line but the buffer has been started
} else if (tagStartIndex == -1 && tagEndIndex == -1 && fullLine.length() > 0) {
/*
* add the current line to the buffer; it is part
* of a larger line
*/
fullLine.append(currentLine);
// start tag is on this line
} else if (tagStartIndex != -1 && tagEndIndex == -1) {
/*
* line started but did not have an end tag; add it to
* a new buffer
*/
fullLine = new StringBuffer(currentLine);
// end tag is on this line
} else if (tagEndIndex != -1 && tagStartIndex == -1) {
/*
* line ended but did not have a start tag; add it to
* the current buffer and then process the buffer
*/
fullLine.append(currentLine);
tempList.add(fullLine.toString());
fullLine = new StringBuffer();
}
}
Given this sample input:
<title>another 
title 0</title>
<title>another title 1</title>
<title>another title 2</title>
<title>another title 3</title>
<desc>description 0</desc>
<desc>another 
description 1</desc>
<title>another title 4</title>
<title>another 
another line in between 
title 5</title>
The full lines in the tempList
for title
become:
<title>another 
title 0</title>
<title>another title 1</title>
<title>another title 2</title>
<title>another title 3</title>
<title>another title 4</title>
<title>another 
another line in between 
title 5</title>
And for desc
:
<desc>description 0</desc>
<desc>another 
description 1</desc>
You should test this approach for performance on your full RSS feed. And also note that the special characters will not be escaped.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.