简体   繁体   English

为什么我的UTF-8编码数据没有保留?UTF-8? 编码?

[英]Why is my UTF-8 encoded data not staying ?UTF-8? encoded?

The problem I'm trying to fix is this: Users of our application are copy/pasting characters from windows-related docs like Word for instance, and our application is not recognizing single and double quotes or bullets. 我要解决的问题是:例如,我们应用程序的用户正在从Windows相关文档(例如Word)中复制/粘贴字符,而我们的应用程序无法识别单引号和双引号或项目符号。

These are the steps I've taken so far to get this data into UTF format: 到目前为止,我已采取以下步骤将这些数据转换为UTF格式:

  1. inside servers.xml, in Connector tag, I added the attribute URIEncoding="UTF-8". 在servers.xml中的连接器标记中,我添加了属性URIEncoding =“ UTF-8”。

  2. in the bean charged with storing the input, I created a byte[] and passed in String holding inputNote text, then converted it to UTF-8. 在负责存储输入的bean中,我创建了一个byte []并传递了包含inputNote文本的String,然后将其转换为UTF-8。 Then passed the UTF-8 converted String back to inputNoteText String. 然后将UTF-8转换后的String传递回inputNoteText String。 Please see directly below for condensed code on this. 请直接在下面查看压缩代码。

     byte[] bytesInUTF8inputNoteText = inputNoteText.getBytes("UTF-8"); inputNoteText = new String(bytesInUTF8inputNoteText, "UTF-8"); this.var = inputNoteText; 
  3. In the variable-setter charged with holding the result from the db query: setNoteText(noteText) to convert the note data coming from database query into bytes in UTF8 format, then converted it back into a String and set it to String noteText property. 在负责保存数据库查询结果的变量设置器中:setNoteText(noteText)将来自数据库查询的注释数据转换为UTF8格式的字节,然后将其转换回String并将其设置为String noteText属性。 Also below. 也在下面。

     public void setNoteText(String noteText) throws UnsupportedEncodingException { byte[] bytesInUTF8inputNoteText = noteText.getBytes("UTF-8"); String noteTextUTF8 = new String(bytesInUTF8inputNoteText, "UTF-8"); this.noteText = noteTextUTF8;} 
  4. In SQL Server I changed the data type from text to nvarchar(MAX) to store the data in Unicode, even though that is a different type of Unicode. 在SQL Server中,我将数据类型从文本更改为nvarchar(MAX)以将数据存储为Unicode,即使这是另一种类型的Unicode。

What I see when I copy/paste from a MS Word doc into our JSF input textbox: 从MS Word文档复制/粘贴到我们的JSF输入文本框中时,我看到的是:

In Eclipse if I set a watch on the property in the bean, once the data in that String property has been converted into UTF-8, all characters are in UTF-8 format. 在Eclipse中,如果我对Bean中的属性进行监视,则将String属性中的数据转换为UTF-8后,所有字符均采用UTF-8格式。 When I post to to SQL Server the string of data held in nvarchar(max) datatype shows all characters in UTF-8 format correctly. 当我发布到SQL Server时,以nvarchar(max)数据类型保存的数据字符串正确显示了UTF-8格式的所有字符。 Then when the resultSet is returned and the holding property is populated with the String returned from the db query, it also shows as all being correctly formatted in UTF-8....BUT,...somewhere in between the correct string value that's sitting in the property that's tied into the JSF page and the JSF page, 1.2 by the way, the value is being unformatted so that I see question marks where I should see single/double quotes and bullet points. 然后,当返回resultSet并使用从db查询返回的String填充holding属性时,它也显示所有内容均以UTF-8 .... BUT正确格式设置,...介于正确的字符串值之间坐在与JSF页面和JSF页面相关联的属性中,顺便说一下,该值是未格式化的1.2,因此我看到了问号,在这里我应该看到单/双引号和项目符号。 I hope that someone has run into this type of issue before and can shed some light on what I need to do to fix this. 我希望以前有人遇到过此类问题,并且可以对我需要做些什么来解决此问题有所了解。 Seems kind of like a JSF bug, thanks in advance for your input!! 似乎有点像JSF错误,在此先感谢您的输入!!

尝试这个

String noteText = new String (noteText.getBytes ("iso-8859-1"), "UTF-8");

When you copy paste from windows documents, the encoding format is not UTF-8 but [Windows-1252] ( http://en.wikipedia.org/wiki/Windows-1252 ). 从Windows文档复制粘贴时,编码格式不是UTF-8,而是[Windows-1252]( http://en.wikipedia.org/wiki/Windows-1252 )。 Note the cells marked in thick green borders. 请注意标记为绿色粗边框的单元格。 These chars DONT map to UTF-8 charset and so you will have to use Windows-1252 encoding while reading. 这些字符DONT映射到UTF-8字符集,因此您在阅读时必须使用Windows-1252编码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM