从Java中的VBScript解码转义的字符串

Question

I tried to decode the following string, 我试图解码以下字符串，

String str  = "AT%26amp%3BT%20Network%20Client%20%u2013%20IBM";

System.out.println(StringEscapeUtils.unescapeHtml(str));
try {
    System.out.println("res:"+java.net.URLDecoder.decode(str, "UTF-8"));
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

Both methods fail as below, 两种方法均失败，如下所示

AT%26amp%3BT%20Network%20Client%20%u2013%20IBM
Exception in thread "main" java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "u2"
    at java.net.URLDecoder.decode(URLDecoder.java:173)
    at decrypt.DecryptHtml.main(DecryptHtml.java:19)

The source of the string is a VBS script that uses the Escape function . 字符串的来源是使用Escape函数的VBS脚本。 How can I decode this string? 如何解码此字符串？

Answer 1

Unfortunately, from reading the documentation, it appears that Microsoft Has Done It Again (tm): "non standard xxx", where here "xxx" is "escaping format". 不幸的是，通过阅读文档，Microsoft似乎又做了一次（tm）：“非标准xxx”，此处的“ xxx”是“转义格式”。

Specifically, in the documentation of the VBScript function , it is said that: 具体来说，在VBScript函数的文档中，据说：

[...]Unicode characters that have a value greater than 255 are stored using the %uxxxx format. 使用％uxxxx格式存储值大于255的Unicode字符。

(Hey, MS: there is no such thing as "Unicode characters"; those are called code points ) （嘿，MS：没有“ Unicode字符”之类的东西；这些被称为代码点 ）

Great. 大。 So you need your own decoding function. 因此，您需要自己的解码功能。

Fortunately, we use Java. 幸运的是，我们使用Java。 And since this proprietary escape sequence only covers Unicode code points in the Basic Multilingual Plane (U+0000 to U+FFFF), and since char is a UTF-16 code unit, and since there is a 1 to 1 mapping between BMP and UTF-16, this makes our job a little easier. 并且由于此专有转义序列仅涵盖基本多语言平面中的Unicode代码点（U + 0000至U + FFFF），并且 char是UTF-16代码单元，并且 BMP和UTF之间存在一对一的映射关系-16，这让我们的工作变得更轻松。

Here is the code: 这是代码：

public final class MSUnescaper
{
    private static final char PERCENT = '%';
    private static final char NONSTANDARD_PCT_ESCAPE = 'u';

    private MSUnescaper()
    {
    }

    public static String unescape(final String input)
    {
        final StringBuilder sb = new StringBuilder(input.length());
        final CharBuffer buf = CharBuffer.wrap(input);

        char c;

        while (buf.hasRemaining()) {
            c = buf.get();
            if (c != PERCENT) {
                sb.append(c);
                continue;
            }
            if (!buf.hasRemaining())
                throw new IllegalArgumentException();
            c = buf.get();
            sb.append(c == NONSTANDARD_PCT_ESCAPE
                ? msEscape(buf) : standardEscape(buf, c));
        }

        return sb.toString();
    }

    private static char standardEscape(final CharBuffer buf, final char c)
    {
        if (!buf.hasRemaining())
            throw new IllegalArgumentException();
        final char[] array = { c, buf.get() };
        return (char) Integer.parseInt(new String(array), 16);
    }

    private static char msEscape(final CharBuffer buf)
    {
        if (buf.remaining() < 4)
            throw new IllegalArgumentException();
        final char[] array = new char[4];
        buf.get(array);
        return (char) Integer.parseInt(new String(array), 16);
    }

    public static void main(final String... args)
    {
        final String input = "AT%26amp%3BT%20Network%20Client%20%u2013%20IBM";
        System.out.println(unescape(input));
    }
}

Output: 输出：

AT&amp;T Network Client – IBM

Answer 2

String str = "AT%26amp%3BT%20Network%20Client%20%[here]u[here]2013%20IBM" I think this string is invalid. String str = "AT%26amp%3BT%20Network%20Client%20%[here]u[here]2013%20IBM"我认为此字符串无效。 %u20 is not valid charecter. %u20不是有效的字符。 If you remove u from your string you can encode it. 如果从字符串中删除u ，则可以对其进行编码。 For reference: w3schools html url encodeing 供参考： w3schools html url编码

从Java中的VBScript解码转义的字符串

问题描述

2 个解决方案

解决方案1
3 已采纳 2014-03-25 07:57:22

解决方案2
-1 2014-03-25 07:52:36

从Java中的VBScript解码转义的字符串

问题描述

2 个解决方案

解决方案1 3 已采纳 2014-03-25 07:57:22

解决方案2 -1 2014-03-25 07:52:36

解决方案1
3 已采纳 2014-03-25 07:57:22

解决方案2
-1 2014-03-25 07:52:36