用于HTML清理的库

Question

I'm looking for a html sanitizer which I can call per API to sanitise strings which I get from my webapp. 我正在寻找一个html清理程序，我可以根据API调用它来清理我从webapp获取的字符串。 Are there some useful easy to use libs available? 是否有一些有用的易用库？ Does anyone knows maybe one or two? 有谁知道也许一两个？

I don't need something big it just must be able to find unclosed tags and close them. 我不需要大的东西它必须能够找到未封闭的标签并关闭它们。

Answer 1

https://github.com/OWASP/java-html-sanitizer is now marked ready for production use. https://github.com/OWASP/java-html-sanitizer现已标记为可供生产使用。

A fast and easy to configure HTML Sanitizer written in Java which lets you include HTML authored by third-parties in your web application while protecting against XSS. 一个快速且易于配置的用Java编写的HTML Sanitizer，它允许您在Web应用程序中包含由第三方创作的HTML，同时防止XSS。

You can use prepackaged policies 您可以使用预先打包的策略

Sanitizers.FORMATTING.and(Sanitizers.LINKS)

or the tests show how you can configure your own easily: 或测试显示如何轻松配置自己：

new HtmlPolicyBuilder()
    .allowElements("a")
    .allowUrlProtocols("https")
    .allowAttributes("href").onElements("a")
    .requireRelNofollowOnLinks()

or write custom policies to do things like changing h1 s to div s with a certain class: 或编写自定义策略来执行诸如将h1更改为具有特定类的div的操作：

new HtmlPolicyBuilder()
    .allowElements("h1", "p")
    .allowElements(
        new ElementPolicy() {
          public String apply(String elementName, List<String> attrs) {
            attrs.add("class");
            attrs.add("header-" + elementName);
            return "div";
          }
        }, "h1"))

Answer 2

JTidy可能会帮助你。

Answer 3

Apart from JTidy you can also take a look at: 除了JTidy，你还可以看看：
Nekohtml Nekohtml
TagSoup TagSoup
Getting text in HTmL document 在HTmL文档中获取文本

Answer 4

HTML Parser JSoup还通过政策支持卫生处理： http ： //jsoup.org/cookbook/cleaning-html/whitelist-sanitizer

Answer 5

http://roberto.open-lab.com/2009/11/05/a-java-html-sanitizer-also-against-xss/

用于HTML清理的库

问题描述

5 个解决方案

解决方案1
24 2012-01-17 17:15:36

解决方案2
10 已采纳 2009-12-22 15:23:29

解决方案3
2 2009-12-22 15:39:10

解决方案4
2 2013-12-18 23:01:01

解决方案5
1 2010-01-05 19:03:46

用于HTML清理的库

问题描述

5 个解决方案

解决方案1 24 2012-01-17 17:15:36

解决方案2 10 已采纳 2009-12-22 15:23:29

解决方案3 2 2009-12-22 15:39:10

解决方案4 2 2013-12-18 23:01:01

解决方案5 1 2010-01-05 19:03:46

解决方案1
24 2012-01-17 17:15:36

解决方案2
10 已采纳 2009-12-22 15:23:29

解决方案3
2 2009-12-22 15:39:10

解决方案4
2 2013-12-18 23:01:01

解决方案5
1 2010-01-05 19:03:46