简体   繁体   English

从Java中的VBScript解码转义的字符串

[英]Decode an escaped string from VBScript in Java

I tried to decode the following string, 我试图解码以下字符串,

String str  = "AT%26amp%3BT%20Network%20Client%20%u2013%20IBM";

System.out.println(StringEscapeUtils.unescapeHtml(str));
try {
    System.out.println("res:"+java.net.URLDecoder.decode(str, "UTF-8"));
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

Both methods fail as below, 两种方法均失败,如下所示

AT%26amp%3BT%20Network%20Client%20%u2013%20IBM
Exception in thread "main" java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "u2"
    at java.net.URLDecoder.decode(URLDecoder.java:173)
    at decrypt.DecryptHtml.main(DecryptHtml.java:19)

The source of the string is a VBS script that uses the Escape function . 字符串的来源是使用Escape函数的VBS脚本。 How can I decode this string? 如何解码此字符串?

Unfortunately, from reading the documentation, it appears that Microsoft Has Done It Again (tm): "non standard xxx", where here "xxx" is "escaping format". 不幸的是,通过阅读文档,Microsoft似乎又做了一次(tm):“非标准xxx”,此处的“ xxx”是“转义格式”。

Specifically, in the documentation of the VBScript function , it is said that: 具体来说,在VBScript函数的文档中 ,据说:

[...]Unicode characters that have a value greater than 255 are stored using the %uxxxx format. 使用%uxxxx格式存储值大于255的Unicode字符。

(Hey, MS: there is no such thing as "Unicode characters"; those are called code points ) (嘿,MS:没有“ Unicode字符”之类的东西;这些被称为代码点

Great. 大。 So you need your own decoding function. 因此,您需要自己的解码功能。

Fortunately, we use Java. 幸运的是,我们使用Java。 And since this proprietary escape sequence only covers Unicode code points in the Basic Multilingual Plane (U+0000 to U+FFFF), and since char is a UTF-16 code unit, and since there is a 1 to 1 mapping between BMP and UTF-16, this makes our job a little easier. 并且由于此专有转义序列仅涵盖基本多语言平面中的Unicode代码点(U + 0000至U + FFFF), 并且 char是UTF-16代码单元, 并且 BMP和UTF之间存在一对一的映射关系-16,这让我们的工作变得更轻松

Here is the code: 这是代码:

public final class MSUnescaper
{
    private static final char PERCENT = '%';
    private static final char NONSTANDARD_PCT_ESCAPE = 'u';

    private MSUnescaper()
    {
    }

    public static String unescape(final String input)
    {
        final StringBuilder sb = new StringBuilder(input.length());
        final CharBuffer buf = CharBuffer.wrap(input);

        char c;

        while (buf.hasRemaining()) {
            c = buf.get();
            if (c != PERCENT) {
                sb.append(c);
                continue;
            }
            if (!buf.hasRemaining())
                throw new IllegalArgumentException();
            c = buf.get();
            sb.append(c == NONSTANDARD_PCT_ESCAPE
                ? msEscape(buf) : standardEscape(buf, c));
        }

        return sb.toString();
    }

    private static char standardEscape(final CharBuffer buf, final char c)
    {
        if (!buf.hasRemaining())
            throw new IllegalArgumentException();
        final char[] array = { c, buf.get() };
        return (char) Integer.parseInt(new String(array), 16);
    }

    private static char msEscape(final CharBuffer buf)
    {
        if (buf.remaining() < 4)
            throw new IllegalArgumentException();
        final char[] array = new char[4];
        buf.get(array);
        return (char) Integer.parseInt(new String(array), 16);
    }

    public static void main(final String... args)
    {
        final String input = "AT%26amp%3BT%20Network%20Client%20%u2013%20IBM";
        System.out.println(unescape(input));
    }
}

Output: 输出:

AT&amp;T Network Client – IBM

String str = "AT%26amp%3BT%20Network%20Client%20%[here]u[here]2013%20IBM" I think this string is invalid. String str = "AT%26amp%3BT%20Network%20Client%20%[here]u[here]2013%20IBM"我认为此字符串无效。 %u20 is not valid charecter. %u20不是有效的字符。 If you remove u from your string you can encode it. 如果从字符串中删除u ,则可以对其进行编码。 For reference: w3schools html url encodeing 供参考: w3schools html url编码

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM