简体   繁体   中英

displaying WINDOWS-1252 encoded text from file as html

i have a text file with WINDOWS-1252 characters like ø and ß. the file is being uploaded via form submit to a servlet, where it's being parsed with opencsv and returned as a List object to a jsp page where it's displayed. the utf-8 chars are displayed as ? and i'm trying to figure out where along the way the encoding might have gone wrong. i've tried a bunch of stuff:

  • my page has the tag <%@page contentType="text/html" pageEncoding="WINDOWS-1252"%>

  • file input is encoded - new FileInputStream(file), "WINDOWS-1252")

  • every string is encoded - s = new String(s.getBytes("WINDOWS-1252"));

where else can the encoding fail? any ideas?

Some troubleshooting suggestions:

Debug print or otherwise examine the text as hex at various phases, and verify that encoding really is what you expect it to be.

Make sure there is no BOM (Byte Order Marker), and see this question and links in it if there is and you don't have an easy way to get rid of it: Reading UTF-8 - BOM marker

OK problem is fixed. So the first problem was that it wasn't a utf-8 file at all but a WINDOWS-1252 one. i determined that using the juniversalchardet lib (very helpful and easy-to-use). Then i had to make sure that i'm reading the file with the right charset by using a FileInputStream:

new FileInputStream(file), "WINDOWS-1252")

the i just had to make sure that i am displaying it with the right charset in the jsp file using the tag <%@page contentType="text/html" pageEncoding="WINDOWS-1252"%>

that's pretty much it-

(1) determine charset

(2) make sure you're reading the file right

(3) make sure you display it right

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM