简体   繁体   English

如何使用Jsoup从html元素中删除所有内联样式和其他属性?

[英]How to remove all inline styles and other attributes from html elements using Jsoup?

How to remove all inline styles and other attributes(class,onclick) from html elements using Jsoup? 如何使用Jsoup从html元素中删除所有内联样式和其他属性(class,onclick)?

Sample Input : 样本输入:

<div style="padding-top:25px;" onclick="javascript:alert('hi');">
This is a sample div <span class='sampleclass'> This is a sample span </span>
</div>

Sample Output : 样本输出:

<div>This is a sample div <span> This is a sample span </span> </div>

My Code (Is this is a right way or any other better approach is there?) 我的代码(这是正确的方法还是其他更好的方法?)

Document doc = Jsoup.parse(html);
Elements el = doc.getAllElements();
for (Element e : el) {
    Attributes at = e.attributes();
    for (Attribute a : at) {    
        e.removeAttr(a.getKey());    
    }
}

Yes, one method is indeed to iterate through the elements and call removeAttr(); 是的,一种方法确实是迭代元素并调用removeAttr();

An alternative method using jsoup is to make use of the Whitelist class (see docs ), which can be used with the Jsoup.clean() function to remove any non-specified tags or attributes from the document. 使用jsoup的另一种方法是使用Whitelist类(请参阅docs ),它可以与Jsoup.clean()函数一起使用,以从文档中删除任何未指定的标记或属性。

For example: 例如:

String html = "<html><head></head><body><div style='padding-top:25px;' onclick='javascript.alert('hi');'>This is a sample div <span class='sampleclass'>This is a simple span</span></div></body></html>";

Whitelist wl = Whitelist.simpleText();
wl.addTags("div", "span"); // add additional tags here as necessary
String clean = Jsoup.clean(html, wl);
System.out.println(clean);

Will result in the following output: 将导致以下输出:

11-05 19:56:39.302: I/System.out(414): <div>
11-05 19:56:39.302: I/System.out(414):  This is a sample div 
11-05 19:56:39.302: I/System.out(414):  <span>This is a simple span</span>
11-05 19:56:39.302: I/System.out(414): </div>

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM