简体   繁体   English

使用 String.charAt() 和 + 运算符连接字符会将它们转换为编码的 UTF-8

[英]Concatenating chars with String.charAt() and + operator turns them into encoded UTF-8

I'm having trouble properly concatenating characters out of a String[] using String.charAt() and + operator with the following method.我在使用String.charAt()+运算符通过以下方法正确连接String[]中的字符时遇到问题。

private PcbGroup createPcbGroup(String[] metadata, PcbGroup pcbGroup) {
    char group_short = metadata[2].charAt(0);
    
    pcbGroup.setId(Integer.parseInt(metadata[0]));
    pcbGroup.setGroup_name(metadata[1]);
    for (int i = 1; i < metadata[2].length(); i++) {
        group_short += metadata[2].charAt(i);
    }
    pcbGroup.setGroup_short(group_short);

    // create and return pcbGroup of this metadata
    return pcbGroup;
}  

I'm reading a CSV file with BufferedReader and populate String[] metadata with it.我正在使用BufferedReader读取一个 CSV 文件并用它填充String[] metadata The content of the String[] metadata is [3, "Foo", ML] . String[] metadata的内容是[3, "Foo", ML] The line char group_short = metadata[2].charAt(0);该行char group_short = metadata[2].charAt(0); correctly assings 'M' to char group_short .正确地将'M'分配给char group_short It then turns into ?然后变成? (space intended) when concatenating it with the second character 'L' . (预期空间)将其与第二个字符'L'连接时。

When I save this object, Hibernate complains about noncorrect String value which appears to be '\xC2\x99' .当我保存这个 object 时,Hibernate 抱怨不正确的字符串值似乎是'\xC2\x99'

Hibernate: insert into pcb_group (group_name, group_short, id) values (?, ?, ?)

2022-11-22 10:41:35.902  WARN 2988 --- [           main] 
o.h.engine.jdbc.spi.SqlExceptionHelper   : SQL Error: 1366, SQLState: HY000

2022-11-22 10:41:35.903 ERROR 2988 --- [           main] 
o.h.engine.jdbc.spi.SqlExceptionHelper   : (conn=3030) Incorrect string value: '\xC2\x99' for 
column 'group_short' at row 1

2022-11-22 10:41:35.903  INFO 2988 --- [           main] o.h.e.j.b.internal.AbstractBatchImpl     
: HHH000010: On release of batch it still contained JDBC statements

I'm struggling with this little thing for a few hours now and it's getting on my nerves, could use some help.我已经为这件小事苦苦挣扎了几个小时,它让我很紧张,需要一些帮助。

This line doesn't do what you seem to think it does:这条线并不像您认为的那样:

group_short += metadata[2].charAt(i);

While this might look like a string concatenation to you, it's not.虽然这对您来说可能看起来像一个字符串连接,但它不是。 group_short is of type char meaning it holds a single character * . group_shortchar类型,意味着它包含单个字符*

What this does is add the codepoint value of the other characters to the one of the first character which doesn't result in anything semantically meaningful for your use case (one could argue that it's a very simple kind of hashing, but it's not even good at that).这样做是将其他字符的代码点值添加到第一个字符中,这不会对您的用例产生任何语义上的意义(有人可能会争辩说这是一种非常简单的散列,但它甚至不好在那)。

What you want to do is have a String (or ideally StringBuilder ) variable and do proper concatenation:你想要做的是有一个String (或者最好是StringBuilder )变量并进行适当的连接:

String group_short = "" + metadata[2].charAt(0);

// and later in the loop:
group_short += metadata[2].charAt(i);

Note that with this code group_short will be exactly the same value as metadata[2] at the end of the loop, making all of that code equivalent to (but way less efficient than) group_short = Objects.nonNull(metadata[2]) .请注意,使用此代码group_short将在循环结束时与metadata[2]的值完全相同,从而使所有该代码等同于(但效率低于) group_short = Objects.nonNull(metadata[2])

* Due to the complexity of Unicode and Java Strings using UTF-16 this is not entirely accurate as multiple char values can be required to make up a single "character" in the "human language" sense, it's close enough for this issue. * 由于使用 UTF-16 的 Unicode 和 Java 字符串的复杂性,这并不完全准确,因为在“人类语言”意义上可能需要多个char值来构成单个“字符”,这对于这个问题来说已经足够接近了。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM