简体   繁体   English

如何将UTF-8字符转换为ISO Latin 1?

[英]How to convert UTF-8 character to ISO Latin 1?

I need to convert a UTF-8 trademark sign to a ISO Latin 1, and save it into database, which is also ISO Latin 1 encoded. 我需要将UTF-8商标标志转换为ISO Latin 1,并将其保存到数据库中,该数据库也是ISO Latin 1编码的。

How can I do that in java? 我怎么能在java中这样做?

I've tried something like 我尝试过类似的东西

String s2 = new String(s1.getBytes("ISO-8859-1"), "utf-8");

but it seems not work as I expected. 但它似乎不像我预期的那样工作。

A string in Java is always in Unicode (UTF-16, effectively). Java中的字符串始终使用Unicode(UTF-16,有效)。 Conversions are only necessary when you're trying to go from text to a binary encoding or vice versa. 只有在尝试从文本转换为二进制编码时才需要转换,反之亦然。

What's the character involved? 涉及的角色是什么? Are you sure it's even present in ISO Latin 1? 你确定它甚至出现在ISO Latin 1中吗? If it is, I'd expect that character to be stored by your database without any problem. 如果是,我希望您的数据库存储该字符没有任何问题。 There's no such thing as a "UTF-8 trademark sign". 没有“UTF-8商标标志”这样的东西。 You could have "the bytes representing the trademark sign UTF-8 encoded" but that would be a byte array, not a string. 您可以使用“表示商标符号UTF-8编码的字节”,但这可能是字节数组,而不是字符串。

EDIT: If you mean the Unicode trademark character U+2122, that's outside the range of ISO-Latin-1. 编辑:如果你的意思是Unicode商标字符 U + 2122,那就超出了ISO-Latin-1的范围。 There's the registered trademark character U+00AE, which isn't the same thing (either in appearance or in legal meaning, IIRC) but may be better than nothing - if you want to use that then just use: 注册商标字符 U + 00AE,这是不一样的(无论是在外观上还是在法律意义上,IIRC)但可能总比没有好 - 如果你想使用它,那么只需使用:

string replaced = original.replace('\u2122', '\u00ae');

As far as I understand, you are trying to store characters (from s1 ) that contains non Latin-1 characters into a DB that only supports ISO-8859-1. 据我所知,您试图将包含非Latin-1字符的字符(从s1 )存储到仅支持ISO-8859-1的DB中。

  • First, I agree with the others to say that it is a dirty idea. 首先,我同意其他人说这是一个肮脏的想法。
    Note that CP1252 is close from ISO-8859-1 (1 byte per character) and includes 请注意, CP1252接近ISO-8859-1(每个字符1个字节)并包含

  • Now, to anwser your question, I think you did the opposite.. 现在,为了回答你的问题,我认为你做了相反的事情......
    You want to encode UTF-8 bytes into ISO-8859-1 : 您想将UTF-8字节编码为ISO-8859-1:

     String s2 = new String(s1.getBytes("UTF-8"), "ISO-8859-1"); 

    This way, s2 is a characher String that, once encoded in ISO-8859-1, will return a byte array which may look like valid UTF-8 bytes. 这样, s2是一个字符串,一旦用ISO-8859-1编码,就会返回一个字节数组,看起来像有效的UTF-8字节。

    To retrieve the original string, you would do 要检索原始字符串,您可以这样做

     String s1 = new String(s2.getBytes("ISO-8859-1"),"UTF-8"); 

BUT WAIT ! 可是等等 ! When doing this, you hope that any byte can be decoded with ISO-8859-1 .. and that your DB will accept such data. 执行此操作时,您希望可以使用ISO-8859-1解码任何字节,并且您的数据库将接受此类数据。 etc.. 等等..

In fact, it is really unsure because officially, ISO-8859-1 doesn't have chars for any byte values . 事实上,它确实不确定,因为正式的, ISO-8859-1没有任何字节值的字符 For instance, from 80 to 9F. 例如,从80到9F。

Then, 然后,

byte[] b = { -97, -100, -128 };
System.out.println( new String(b,"ISO-8859-1") );

would display ??? 会显示???

However, in Java , s.getBytes("ISO-8859-1") indeed restores the initial array. 但是, 在Java中s.getBytes("ISO-8859-1")确实恢复了初始数组。

  1. Read what Jon Skeet told you. 阅读Jon Skeet告诉你的内容。 The Code you posted is rubbish (it takes the UTF-8 encoded form of your String and interprets it as if it were ISO-8859-1, this accomplishes nothing useful). 您发布的代码是垃圾(它采用您的字符串的UTF-8编码形式,并将其解释为ISO-8859-1,这没有任何用处)。
  2. The ISO-8859-1 encoding (aka Latin1) doesn't contain the Trademark character "™". ISO-8859-1编码(又名Latin1)不包含商标字符“™”。

I had a similar problem and solved it by converting the the none-translatable chars in Entitys. 我遇到了类似的问题,并通过转换实体中不可翻译的字符来解决它。 If you display the information later as html you are fine anyway. 如果您稍后将信息显示为html,则无论如何都可以。

If not, you could try to convert them back to unicode. 如果没有,您可以尝试将它们转换回unicode。

example in python with "Trademark": python中带有“商标”的示例:

s = u'yellow bananas\u2122'.encode('latin1', 'xmlcharrefreplace')
# s is 'yellow bananas™'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM