[英]Link extraction using owasp-java-html-sanitizer
I'm planning on using the owasp-java-html-sanitizer to perform a few tasks on user generated html. 我打算使用owasp-java-html-sanitizer在用户生成的html上执行一些任务。
I'd like to extract a list of the URLs from the html string. 我想从html字符串中提取一个URL列表。
I would also like to make sure all links have the target set to "_blank", this seems to be similar to the HtmlPolicyBuilder.requireRelNofollowOnLinks
configuration. 我还想确保所有链接都将目标设置为“_blank”,这似乎与
HtmlPolicyBuilder.requireRelNofollowOnLinks
配置类似。 (done) (完成)
PolicyFactory linkRewrite = new HtmlPolicyBuilder().allowAttributes("href").onElements("a")
.requireRelNofollowOnLinks().allowElements(new ElementPolicy() {
public String apply(String elementName, List<String> attrs) {
attrs.add("target");
attrs.add("_blank");
return "a";
}
}, "a").toFactory();
This adds target="_blank"
to links, not sure its the best way to accomplish it. 这会将
target="_blank"
到链接,不确定它是实现它的最佳方法。
This also extracts the URLs: 这也会提取网址:
.allowElements(new ElementPolicy() {
public String apply(String elementName, List<String> attrs) {
for (int i = 0, n = attrs.size(); i < n; i += 2) {
if ("href".equals(attrs.get(i))) {
urls.add(attrs.get(i + 1));
break;
}
}
attrs.add("target");
attrs.add("_blank");
return elementName;
}
}, "a")
.allowElements(
new ElementPolicy() {
public String apply(String elementName, List<String> attrs) {
// Make sure that all links open in new windows/tabs without
// using <base target> which also affects unsanitized links.
attrs.add("target");
attrs.add("_blank");
return elementName;
}
}, "a")
.allowAttributes("href").matching(
new AttributePolicy() {
public String apply(String elementName, String attributeName, String value) {
// Collect all link URLs.
urls.add(value);
return value;
}
}).onElements("a")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.