简体   繁体   中英

How to import txt file with UTF-8 encoding into jsp?

Hey I'm working on a web application and have problems with read UTF-8 chars from txt files. I get UTF-8 working that way: UTF-8 web encoding (and it workes fine except at the import). I tryed a lot of thinks (especially from: read UTF-8 string literal java ) but nothing work and I have no idea why.

The importent codesnippets:

import.jsp

<%@ page language="java" contentType="text/html; charset=UTF-8"
pageEncoding="UTF-8"%>
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="fi">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Import</title>
<link rel="stylesheet" href="//code.jquery.com/ui/1.12.1/themes/base/jquery-ui.css">
<script src="https://code.jquery.com/jquery-1.12.4.js"></script>
<script src="https://code.jquery.com/ui/1.12.1/jquery-ui.js"></script>
<script src="script.js"></script>
<link rel="stylesheet" type="text/css"
media="screen and (min-device-width: 500px)" href="style.css" />
<link rel="stylesheet"
href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/4.7.0/css/font-awesome.min.css">
</head>
<body>
<form>
    <!-- show import data -->
</form>
<form id="importForm" action="${pageContext.request.contextPath}/ImportData" method="post" onsubmit="return importValidation();" enctype="multipart/form-data">
    <input type="file" name="file" accept=".txt"/>
    <input type="submit" value="Import">
</form>

</body>
</html>

ImportData Servlet:

import java.nio.charset.StandardCharsets;

@WebServlet("/ImportData")
@MultipartConfig
public class ImportData extends HttpServlet {

    protected void doPost(HttpServletRequest request, HttpServletResponse response) throws ServletException, IOException {
         Part filePart = request.getPart("file"); // Retrieves <input type="file" name="file">
         BufferedReader buf = new BufferedReader(new InputStreamReader(filePart.getInputStream(), StandardCharsets.UTF_8.name()));
         String lineJustFetched = null;
         String[] wordsArray = null;
         ArrayList<String> texts = new ArrayList<String>();
         while(true){
             lineJustFetched = buf.readLine();
             if(lineJustFetched == null){  
                 break; 
             }else{
                 wordsArray = lineJustFetched.split("\t");
                 for(String each : wordsArray){
                     texts.add(each);
                 }
             }
         }
         buf.close();

        System.out.println(texts);

        //create Import Data in Backend and write it into db

        response.sendRedirect("import.jsp");
    }
}

System details: Tomcat server 7 with Java 1.7

The outprint of texts for UTF-8 chars is a square and in html inputs (and texts) is a instead of the UTF-8 chars

So my question is: Where and why do I lost the UTF-8 encoding?

Ok I didn't look right... The file is not UTF-8 encoded (it is ANSI encoded) with UTF-8 encoding this code workes fine.

To make it runnable for an other encoding you have only to change the InputStreamReader encoding to read the file correctly.

eg

 BufferedReader buf = new BufferedReader(new 
       InputStreamReader(filePart.getInputStream(), "Cp1252"));

(for windows-ANSI)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM