简体   繁体   English

如何使用正则表达式从Java中的字符串中删除一些html标记

[英]How to use regular expressions to remove some html tags from string in java

I wrote a code to read news from XML file (Feed) .. and I have to display the description of each item in my list view ... and I used this peas of code to remove the html tags exists inside the description tag : 我写了一个代码来从XML文件(Feed)中读取新闻。.,我必须在列表视图中显示每个项目的描述...,我使用了这段代码来删除description标记中存在的html标记:

else if ("description".equals(tagName)){
                             sourcedescription= parser.nextText();
                             description=Html.fromHtml(sourcedescription).toString();
                             Log.d("msg", description);
                             feedDescription.add(description);

                         }

some items I succeeded to display its description without tags ie in an understood manner , BUT I failed to remove all tags for some other items which have {iframe} {/iframe} tag ... and I think this tag exists in the description tags of the items which have "no description" 有些项目我成功地显示了没有标签的描述,即以一种易于理解的方式,但是我未能删除具有{iframe} {/ iframe}标签的其他一些项目的所有标签...,我认为此标签存在于描述标签中没有“描述”的项目

<description><![CDATA[<p>{iframe height="600"}<a href="http://admreg.yu.edu.jo/index.php?option=com_content&view=article&id=606:------20132014&catid=87:2011-01-25-18-12-08&Itemid=438">http://admreg.yu.edu.jo/index.php?option=com_content&view=article&id=606:------20132014&catid=87:2011-01-25-18-12-08&Itemid=438</a><span style="line-height: 1.3em;">{/iframe}</span></p>]]></description>

My question is how to remove the iframe tag by using regular expressions ? 我的问题是如何使用正则表达式删除iframe广告代码?

A posible solution would be 可能的解决方案是

    String regexp = "\\{/?iframe.*?\\}";
    String text = "<description><![CDATA[<p>{iframe height=\"600\"}<a href=\"http://admreg.yu.edu.jo/index.php?option=com_content&view=article&id=606:------20132014&catid=87:2011-01-25-18-12-08&Itemid=438\">http://admreg.yu.edu.jo/index.php?option=com_content&view=article&id=606:------20132014&catid=87:2011-01-25-18-12-08&Itemid=438</a><span style=\"line-height: 1.3em;\">{/iframe}</span></p>]]></description>";
    System.out.println(text.replaceAll(regexp, ""));

If you want to remove the content inside the tag iframe, use this regexp instead: 如果要删除代码iframe内的内容,请改用此regexp:

text.replaceAll("\\{iframe .*?\\}.*?\\{/iframe\\}", "")

Use these regex: 使用以下正则表达式:

\{iframe[^\}]*\}   // to delete the opening tag
\{/iframe[^\}]*\}  // to delete the closing tag

These regex won't delete what is in the iframe. 这些正则表达式不会删除iframe中的内容。

Note : Use a parser if you have the option. 注意 :如果有选择, 使用解析器。 That said...for a quick and dirty.. 那就是...为了快速又肮脏..

str.replaceAll("\\{/?iframe.*?\\}", "");

To remove the content between these tags. 删除这些标签之间的内容。

str.replaceAll("\\{iframe.*?\\}.*?\\{/iframe\\}", "")

HTML is not a regular language. HTML不是常规语言。 Don't use RegEx with it, or you'll die. 不要将其与RegEx一起使用,否则会死掉。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM