简体   繁体   English

以html格式显示WINDOWS-1252编码文本

[英]displaying WINDOWS-1252 encoded text from file as html

i have a text file with WINDOWS-1252 characters like ø and ß. 我有一个文本文件,带有ø和ß这样的WINDOWS-1252字符。 the file is being uploaded via form submit to a servlet, where it's being parsed with opencsv and returned as a List object to a jsp page where it's displayed. 该文件通过表单提交上载到servlet,然后在其中使用opencsv进行解析,并以List对象的形式返回到显示该文件的jsp页面。 the utf-8 chars are displayed as ? utf-8字符显示为? and i'm trying to figure out where along the way the encoding might have gone wrong. 我正在尝试找出编码可能在哪里出错。 i've tried a bunch of stuff: 我试过很多东西:

  • my page has the tag <%@page contentType="text/html" pageEncoding="WINDOWS-1252"%> 我的页面具有标签<%@page contentType="text/html" pageEncoding="WINDOWS-1252"%>

  • file input is encoded - new FileInputStream(file), "WINDOWS-1252") 文件输入已编码- new FileInputStream(file), "WINDOWS-1252")

  • every string is encoded - s = new String(s.getBytes("WINDOWS-1252")); 每个字符串都经过编码s = new String(s.getBytes("WINDOWS-1252"));

where else can the encoding fail? 编码还会在其他地方失败? any ideas? 有任何想法吗?

Some troubleshooting suggestions: 一些故障排除建议:

Debug print or otherwise examine the text as hex at various phases, and verify that encoding really is what you expect it to be. 在各个阶段进行调试打印或以其他方式检查文本为十六进制,并验证编码确实符合您的期望。

Make sure there is no BOM (Byte Order Marker), and see this question and links in it if there is and you don't have an easy way to get rid of it: Reading UTF-8 - BOM marker 确保没有BOM(字节顺序标记),如果有并且您没有摆脱它的简便方法,请查看此问题并在其中链接: 读取UTF-8-BOM标记

OK problem is fixed. 确定问题已解决。 So the first problem was that it wasn't a utf-8 file at all but a WINDOWS-1252 one. 因此,第一个问题是它根本不是utf-8文件,而是WINDOWS-1252。 i determined that using the juniversalchardet lib (very helpful and easy-to-use). 我确定使用juniversalchardet lib(非常有帮助且易于使用)。 Then i had to make sure that i'm reading the file with the right charset by using a FileInputStream: 然后,我必须确保使用FileInputStream读取具有正确字符集的文件:

new FileInputStream(file), "WINDOWS-1252")

the i just had to make sure that i am displaying it with the right charset in the jsp file using the tag <%@page contentType="text/html" pageEncoding="WINDOWS-1252"%> 我只需要确保使用标签<%@page contentType="text/html" pageEncoding="WINDOWS-1252"%>在jsp文件中以正确的字符集显示它即可

that's pretty much it- 差不多了

(1) determine charset (1)确定字符集

(2) make sure you're reading the file right (2)确保您正在正确读取文件

(3) make sure you display it right (3)确保正确显示

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM