简体   繁体   中英

How to convert special characters in a string to unicode?

I couldn't find an answer to this problem, having tried several answer here combined to find something that works, to no avail. An application I'm working on uses a users name to create PDF's with that name in it. However, when someones name contains a special character like "Yağmur" the pdf creator freaks out and omits this special character. However, when it gets the unicode equivalent ( "Yağmur" ), it prints "Yağmur" in the pdf as it should.

How do I check a name/string for any special character (regex = "[^a-z0-9 ]" ) and when found, replace that character with its unicode equivalent and returning the new unicoded string?

I will try to give the solution in generic way as the frame work you are using is not mentioned as the part of your problem statement.

I too faced the same kind of issue long time back. This should be handled by the pdf engine if you set the text/char encoding as UTF-8. Please find how you can set encoding in your framework for pdf generation and try it out. Hope it helps !!

One hackish way to do this would be as follows:

/*
 * TODO: poorly named 
 */ 
public static String convertUnicodePoints(String input) {
    // getting char array from input
    char[] chars =  input.toCharArray();
    // initializing output
    StringBuilder sb = new StringBuilder();
    // iterating input chars
    for (int i = 0; i < input.length(); i++) {
        // checking character code point to infer whether "conversion" is required
        // here, picking an arbitrary code point 125 as boundary
        if (Character.codePointAt(input, i) < 125) {
            sb.append(chars[i]);
        }
        // need to "convert", code point > boundary
        else {
            // for hex representation: prepends as many 0s as required
            // to get a hex string of the char code point, 4 characters long
            // sb.append(String.format("&#xu%04X;", (int)chars[i]));

            // for decimal representation, which is what you want here
            sb.append(String.format("&#%d;", (int)chars[i]));
        }
    }
    return sb.toString();
}

If you execute: System.out.println(convertUnicodePoints("Yağmur")); ...

... you'll get: Ya&#287;mur .

Of course, you can play with the "conversion" logic and decide which ranges get converted.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM