如何将字符串中的特殊字符转换为unicode？

Question

I couldn't find an answer to this problem, having tried several answer here combined to find something that works, to no avail. 我找不到这个问题的答案，在这里尝试了几个答案，结合找到有用的东西，但无济于事。 An application I'm working on uses a users name to create PDF's with that name in it. 我正在处理的应用程序使用用户名来创建具有该名称的PDF。 However, when someones name contains a special character like "Yağmur" the pdf creator freaks out and omits this special character. 但是，当某人的名字包含像"Yağmur"这样的特殊字符时，pdf创建者会"Yağmur"并省略这个特殊字符。 However, when it gets the unicode equivalent ( "Yağmur" ), it prints "Yağmur" in the pdf as it should. 然而，当它获得unicode等价物（ "Yağmur" ）时，它会在pdf中打印"Yağmur" 。

How do I check a name/string for any special character (regex = "[^a-z0-9 ]" ) and when found, replace that character with its unicode equivalent and returning the new unicoded string? 如何检查任何特殊字符的名称/字符串（regex = "[^a-z0-9 ]" ），找到后，用等效的unicode替换该字符并返回新的unicoded字符串？

Answer 1

I will try to give the solution in generic way as the frame work you are using is not mentioned as the part of your problem statement. 我将尝试以通用方式提供解决方案，因为您正在使用的框架工作未被提及作为问题陈述的一部分。

I too faced the same kind of issue long time back. 我很久以前也遇到过同样的问题。 This should be handled by the pdf engine if you set the text/char encoding as UTF-8. 如果将text / char编码设置为UTF-8，则应由pdf引擎处理。 Please find how you can set encoding in your framework for pdf generation and try it out. 请找到如何在框架中设置编码以生成pdf并进行试用。 Hope it helps !! 希望能帮助到你！！

Answer 2

One hackish way to do this would be as follows: 一种执行此操作的hackish方式如下：

/*
 * TODO: poorly named 
 */ 
public static String convertUnicodePoints(String input) {
    // getting char array from input
    char[] chars =  input.toCharArray();
    // initializing output
    StringBuilder sb = new StringBuilder();
    // iterating input chars
    for (int i = 0; i < input.length(); i++) {
        // checking character code point to infer whether "conversion" is required
        // here, picking an arbitrary code point 125 as boundary
        if (Character.codePointAt(input, i) < 125) {
            sb.append(chars[i]);
        }
        // need to "convert", code point > boundary
        else {
            // for hex representation: prepends as many 0s as required
            // to get a hex string of the char code point, 4 characters long
            // sb.append(String.format("&#xu%04X;", (int)chars[i]));

            // for decimal representation, which is what you want here
            sb.append(String.format("&#%d;", (int)chars[i]));
        }
    }
    return sb.toString();
}

If you execute: System.out.println(convertUnicodePoints("Yağmur")); 如果执行： System.out.println(convertUnicodePoints("Yağmur")); ... ...

... you'll get: Yağmur . ......你会得到的： Yağmur 。

Of course, you can play with the "conversion" logic and decide which ranges get converted. 当然，您可以使用“转换”逻辑并决定转换哪些范围。

如何将字符串中的特殊字符转换为unicode？

问题描述

2 个解决方案

解决方案1
1 已采纳 2015-08-27 12:14:06

解决方案2
0 2015-08-27 12:17:11

如何将字符串中的特殊字符转换为unicode？

问题描述

2 个解决方案

解决方案1 1 已采纳 2015-08-27 12:14:06

解决方案2 0 2015-08-27 12:17:11

解决方案1
1 已采纳 2015-08-27 12:14:06

解决方案2
0 2015-08-27 12:17:11