OWASP html sanitizer - Why does it unescape some entities?

Question

I'm new to Owasp and it's HTML sanitizer, and find that with any policy I use, it's unescaping some entities back into characters.

For example this string:

&#64; test &#33;

gets turned into this:

&#64; test !

I'd like to leave the entities "as is" as much as possible. I'd even understand it if it was escaping them, and not unescaping them.

So is this possible with the sanitizer? It seems to do it no matter what I try and use for a policy.

Here's the code I'm running for my simple test:

package com.my.company.test;

import org.junit.Test;
import org.owasp.html.PolicyFactory;
import org.owasp.html.Sanitizers;

import junit.framework.TestCase;

public class OwaspSanitizerTest extends TestCase {
  public static final PolicyFactory POLICY = Sanitizers.IMAGES;

  @Test
  public static final void testTextFilter() throws Exception {
      String data = "&#64; test &#33;";
      String result = POLICY.sanitize(data);

      System.out.println(result);

      assertEquals("&#64; test &#33;", result);
  }
}

EDIT: The reason I ask is that I want my users inputs to match what we output as much as possible. I understand that this won't be possible in some situations, but would've expected it would be in this case.

Answer 1

清理程序对文本节点进行解码，然后对它们进行重新编码以阻止编码级别的攻击，从而确保输出尽可能接近HTML和XML的交集，以最大限度地减少天真的后处理器重新生成的可能性。引入漏洞。

OWASP html sanitizer - Why does it unescape some entities?

Question

1 answers

solution1
2 ACCPTED 2014-03-26 01:07:10

OWASP html sanitizer - Why does it unescape some entities?

Question

1 answers

solution1 2 ACCPTED 2014-03-26 01:07:10

solution1
2 ACCPTED 2014-03-26 01:07:10