简体   繁体   中英

Decode an escaped string from VBScript in Java

I tried to decode the following string,

String str  = "AT%26amp%3BT%20Network%20Client%20%u2013%20IBM";

try {
    System.out.println("res:"+java.net.URLDecoder.decode(str, "UTF-8"));
} catch (UnsupportedEncodingException e) {
    // TODO Auto-generated catch block

Both methods fail as below,

Exception in thread "main" java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in escape (%) pattern - For input string: "u2"
    at java.net.URLDecoder.decode(URLDecoder.java:173)
    at decrypt.DecryptHtml.main(DecryptHtml.java:19)

The source of the string is a VBS script that uses the Escape function . How can I decode this string?

Unfortunately, from reading the documentation, it appears that Microsoft Has Done It Again (tm): "non standard xxx", where here "xxx" is "escaping format".

Specifically, in the documentation of the VBScript function , it is said that:

[...]Unicode characters that have a value greater than 255 are stored using the %uxxxx format.

(Hey, MS: there is no such thing as "Unicode characters"; those are called code points )

Great. So you need your own decoding function.

Fortunately, we use Java. And since this proprietary escape sequence only covers Unicode code points in the Basic Multilingual Plane (U+0000 to U+FFFF), and since char is a UTF-16 code unit, and since there is a 1 to 1 mapping between BMP and UTF-16, this makes our job a little easier.

Here is the code:

public final class MSUnescaper
    private static final char PERCENT = '%';
    private static final char NONSTANDARD_PCT_ESCAPE = 'u';

    private MSUnescaper()

    public static String unescape(final String input)
        final StringBuilder sb = new StringBuilder(input.length());
        final CharBuffer buf = CharBuffer.wrap(input);

        char c;

        while (buf.hasRemaining()) {
            c = buf.get();
            if (c != PERCENT) {
            if (!buf.hasRemaining())
                throw new IllegalArgumentException();
            c = buf.get();
            sb.append(c == NONSTANDARD_PCT_ESCAPE
                ? msEscape(buf) : standardEscape(buf, c));

        return sb.toString();

    private static char standardEscape(final CharBuffer buf, final char c)
        if (!buf.hasRemaining())
            throw new IllegalArgumentException();
        final char[] array = { c, buf.get() };
        return (char) Integer.parseInt(new String(array), 16);

    private static char msEscape(final CharBuffer buf)
        if (buf.remaining() < 4)
            throw new IllegalArgumentException();
        final char[] array = new char[4];
        return (char) Integer.parseInt(new String(array), 16);

    public static void main(final String... args)
        final String input = "AT%26amp%3BT%20Network%20Client%20%u2013%20IBM";


AT&amp;T Network Client – IBM

String str = "AT%26amp%3BT%20Network%20Client%20%[here]u[here]2013%20IBM" I think this string is invalid. %u20 is not valid charecter. If you remove u from your string you can encode it. For reference: w3schools html url encodeing

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM