简体   繁体   English

如何将字符串中的特殊字符转换为unicode?

[英]How to convert special characters in a string to unicode?

I couldn't find an answer to this problem, having tried several answer here combined to find something that works, to no avail. 我找不到这个问题的答案,在这里尝试了几个答案,结合找到有用的东西,但无济于事。 An application I'm working on uses a users name to create PDF's with that name in it. 我正在处理的应用程序使用用户名来创建具有该名称的PDF。 However, when someones name contains a special character like "Yağmur" the pdf creator freaks out and omits this special character. 但是,当某人的名字包含像"Yağmur"这样的特殊字符时,pdf创建者会"Yağmur"并省略这个特殊字符。 However, when it gets the unicode equivalent ( "Yağmur" ), it prints "Yağmur" in the pdf as it should. 然而,当它获得unicode等价物( "Yağmur" )时,它会在pdf中打印"Yağmur"

How do I check a name/string for any special character (regex = "[^a-z0-9 ]" ) and when found, replace that character with its unicode equivalent and returning the new unicoded string? 如何检查任何特殊字符的名称/字符串(regex = "[^a-z0-9 ]" ),找到后,用等效的unicode替换该字符并返回新的unicoded字符串?

I will try to give the solution in generic way as the frame work you are using is not mentioned as the part of your problem statement. 我将尝试以通用方式提供解决方案,因为您正在使用的框架工作未被提及作为问题陈述的一部分。

I too faced the same kind of issue long time back. 我很久以前也遇到过同样的问题。 This should be handled by the pdf engine if you set the text/char encoding as UTF-8. 如果将text / char编码设置为UTF-8,则应由pdf引擎处理。 Please find how you can set encoding in your framework for pdf generation and try it out. 请找到如何在框架中设置编码以生成pdf并进行试用。 Hope it helps !! 希望能帮助到你 !!

One hackish way to do this would be as follows: 一种执行此操作的hackish方式如下:

/*
 * TODO: poorly named 
 */ 
public static String convertUnicodePoints(String input) {
    // getting char array from input
    char[] chars =  input.toCharArray();
    // initializing output
    StringBuilder sb = new StringBuilder();
    // iterating input chars
    for (int i = 0; i < input.length(); i++) {
        // checking character code point to infer whether "conversion" is required
        // here, picking an arbitrary code point 125 as boundary
        if (Character.codePointAt(input, i) < 125) {
            sb.append(chars[i]);
        }
        // need to "convert", code point > boundary
        else {
            // for hex representation: prepends as many 0s as required
            // to get a hex string of the char code point, 4 characters long
            // sb.append(String.format("&#xu%04X;", (int)chars[i]));

            // for decimal representation, which is what you want here
            sb.append(String.format("&#%d;", (int)chars[i]));
        }
    }
    return sb.toString();
}

If you execute: System.out.println(convertUnicodePoints("Yağmur")); 如果执行: System.out.println(convertUnicodePoints("Yağmur")); ... ...

... you'll get: Ya&#287;mur . ......你会得到的: Ya&#287;mur

Of course, you can play with the "conversion" logic and decide which ranges get converted. 当然,您可以使用“转换”逻辑并决定转换哪些范围。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM