繁体   English   中英

Android-使用正则表达式从XML(Rss / Atom)Feed中过滤html标签

[英]Android - Filtering html tags from XML(Rss/Atom) Feed using Regular Expressions

我开发我的网站HTTP新闻阅读器应用程序://www.werchelsea.com/,但未通过饲料的最新消息: http://www.werchelsea.com/feed/atom/ ,我成功的拿到饲料正确地将其转换为字符串。 现在我的主要问题是我的提要描述包含带有html标签的数据,例如:

<p>It was Raul Meireles who came from the Merseyside to London to complete his move from Liverpool to Chelsea on the dead line day of the summer transfer window last year, when Chelsea failed to sign the highly-rated midfielder, Luka Modric. Chelsea were left with no other choice but to sign the Portuguese midfielder.</p>
<p>Meireles was a regular starter under the management of Villas-Boas, he really enjoyed working under

<a href='http://www.werchelsea.com/2012/09/05/time-to-say-good-bye-to-raul-meireles/303777_153113331443746_1122718871_n/' title='303777_153113331443746_1122718871_n'><img width="150" height="150" src="http://www.werchelsea.com/wp-content/uploads/2012/09/303777_153113331443746_1122718871_n-150x150.jpg" class="attachment-thumbnail" alt="Meireles first training session with Chelsea football club" title="303777_153113331443746_1122718871_n" /></a>

我尝试用正则表达式替换所有这些标签,但是由于某种原因,我无法找到匹配所有html标签类型的正确RE。 我曾经替换的是:

protected String doInBackground(String... arg0) {

    String response="";
    try{
     URL feedwebsite=new URL(feedURL);
     SAXParserFactory spf=SAXParserFactory.newInstance();
     SAXParser sp = spf.newSAXParser();
     XMLHandler feedHandler=new XMLHandler();
     XMLReader feedReader=sp.getXMLReader();
     feedReader.setContentHandler(feedHandler);
     InputSource is=new InputSource(feedwebsite.openStream());
     feedReader.parse(is);
     response=feedHandler.getParsedFeed().replaceAll("<"+"[0-9a-zA-Z]+"+">","_").replaceAll("</"+"[0-9a-zA-Z]+"+">","-").replaceAll("<"+"[0-9a-zA-Z]+"+"/>",".");  
    }
    catch (Exception e)
    {
        response="Cannot Connect to the server.Please Check your Wifi/Data   Connection.";
        e.printStackTrace();
    }

    return response;
}***

如果使用RE替换字符串是正确的操作,或者还有其他方法,请帮助我。

要匹配HTML标签(打开或关闭),请使用以下正则表达式:

<[^>]+?>

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM