简体   繁体   中英

Turkish characters are not shown properly on the HTML

I'm trying to print something to an HTML web page using a servlet code. I use UTF-8 encoding but Turkish characters are not shown adequately on the web page.

How I define UTF-8 encoding:

  String htmlStart =  "<html>\n" +
                    "<head>
<meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" />
 <title>" + title + "</title>
</head>\n" +
                    "<body bgcolor = \"#f0f0f0\">\n" +
                       "<h1 align = \"center\">" + title + "</h1>\n" +
                       "<ul>\n" + "  <li><b>"+url + "</b>" +  "</ul>\n"; 

How I print words in html:

 for (String token : parsed) {
                med+= "<p>" +  token + "</p>\n"; 
                System.out.println(token);
            }   

What is written to the Eclipse console by the above code:

Muğla Sıtkı Koçman Üniversitesi

What I see at the generated HTML:

Mu?la S?tk? Koçman Üniversitesi

You are already specifying "charset=utf-8" for the generated HTML, so reading/rendering the data shouldn't be a problem in the browser (as you suggest).

But your console sample code is incorrect because it does not specify that UTF-8 is to be used. The default behavior will be to use the default encoding of your platform when creating the data, which is probably not what you want.

The simplest way to fix that in your sample code is to reassign System.out to a PrintStream that uses UTF-8 by calling setOut() :

String text = "Muğla Sıtkı Koçman Üniversitesi";
System.out.println(text + " (default PrintStream)");         
System.setOut(new PrintStream(System.out, true, "UTF8"));
System.out.println(text + " (UTF-8 PrintStream)");  

However, if I run that code from the Windows Command Prompt I get this mess:

Mu?la S?tk? Koçman Üniversitesi (default PrintStream)

Muğla Sıtkı Koçman Üniversitesi (UTF-8 PrintStream)

The first line fails (like yours) because the data is being written and read using the default encoding, which is Cp437 on my machine. And the second line fails because although the data is being correctly written as UTF-8, it is still being rendered using Cp437.

To fix that, explicitly set your console's code page to UTF-8 by specifying chcp 65001 in the console before running your code (on Windows at least). Then you will see that the second line renders correctly, because it is both written and read as UTF-8:

Mu?la S?tk? Koman niversitesi (default PrintStream)

Muğla Sıtkı Koçman Üniversitesi (UTF-8 PrintStream)

Notes:

  • You don't show how the generated HTML is created in your servlet, but if you ensure that it is being written as UTF-8 you should be OK.
  • If you are still stuck, update your question to show the full source of your servlet, and if that is too large create a Minimal, Reproducible Example .
  • I think it's unhelpful that Eclipse is doing things behind the scenes to allow the output to render correctly in its Console. I'm not sure why the Eclipse team decided to do that, because it masks an underlying issue in your code.

I'm grateful for your reply which helped me to solve the case. As happened at most of my challenges on fixing bugs, making efforts on creating a minimal, reproducible example again helped. I found the culprit, after removing that, my code worked as I wished, although I still don't have any idea how it effects UTF-8 encoding:

response.getWriter().append("Served at: ").append(request.getContextPath()); 

You can see all the minimal, reproducible servlet code below:

import java.io.IOException;
import java.io.OutputStreamWriter;
import java.io.PrintStream;
import java.io.PrintWriter;

import javax.print.attribute.standard.Severity;
import javax.servlet.ServletException;
import javax.servlet.annotation.WebServlet;
import javax.servlet.http.HttpServlet;
import javax.servlet.http.HttpServletRequest;
import javax.servlet.http.HttpServletResponse;
   
/**
 * Servlet implementation class MyServlet
 */
@WebServlet("/Minimal")
public class Minimal extends HttpServlet {
    private static final long serialVersionUID = 1L;
       
    /**
     * @see HttpServlet#HttpServlet()
     */
    public Minimal() {
        super();
        // TODO Auto-generated constructor stub
    }

    /**
     * @see HttpServlet#doGet(HttpServletRequest request, HttpServletResponse response)
     */
    protected void doGet(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException{
        // TODO Auto-generated method stub
         
        System.setOut(new PrintStream(System.out, true, "UTF8"));
        
        
        // The culprit code part which effects UTF-8's proper working 
        //response.getWriter().append("Served at: ").append(request.getContextPath());
        
        response.setContentType("text/html; charset=utf-8");
        response.setCharacterEncoding("UTF-8");
        PrintWriter out = response.getWriter();
         
          String title = "GEMED Software Requirements";
          String docType =  "<!doctype html public \"-//w3c//dtd html 4.0 " +
             "transitional//tr\">\n";
             
          String url = request.getParameter("site11");  
         
          String htmlStart =  docType + "<html>\n" +
                    "<head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" /> <title>" + title + "</title></head>\n" +
                    "<body bgcolor = \"#f0f0f0\">\n" +
                       "<h1 align = \"center\">" + title + "</h1>\n" +
                       "<ul>\n" + "  <li><b>"+url + "</b>" +  "</ul>\n"; 
          
         String med2 = "Türkçe: Muğla Sıtkı Koçman Üniversitesi Rektörü Çalışmayan Öğrencilerden Şikayetçi......" ;
          String htmlEnd =  "</body>" +  "</html>";
    
          out.println( htmlStart + med2 +  htmlEnd); 
    
    }

    /**
     * @see HttpServlet#doPost(HttpServletRequest request, HttpServletResponse response)
     */
    protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
        // TODO Auto-generated method stub
        doGet(request, response);
    }

}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM