简体   繁体   English

非UTF 8编码的CSV特殊字符问题

[英]CSV special characters issue for non UTF 8 encoding

Hi I am using a spring mvc application to process excel and csv file. 嗨,我正在使用Spring MVC应用程序来处理excel和csv文件。 I have encountered one issue that for special characters such as DèéêàáâÉ once process it is converting it to D which is wrong. 我遇到了一个问题,对于诸如DèéêàáââÉ之类的特殊字符,一旦处理就会将其转换为D. However when the csv file encoding is UTF8 the special characters is converted successfully. 但是,当csv文件编码为UTF8时,特殊字符将成功转换。

Part of the ajax call is shown below: ajax调用的一部分如下所示:

$('#fileuploading').fileupload({

url: 'uploadFile',
dataType: 'json',
acceptFileTypes: /(\.|\/)(csv|xlsx)$/i,
maxFileSize: 10000000,
autoUpload: false,
 disableImageLoad: true,
disableAudioPreview: true,
disableVideoPreview: true,
disableValidation: false,
disableImageResize: true

}) })

My controller method is shown below: 我的控制器方法如下所示:

@RequestMapping(value = "/uploadFile", method = RequestMethod.POST)
    public @ResponseBody List<JSONResult> uploadFileHandler(
            @RequestParam("files") MultipartFile file, HttpServletRequest request) {
        logger.info("Starting upload of file: " + file.getOriginalFilename());
        JSONResult result = null;

            try {
                result = uploadFile(file, appUserDTO, result, request);
            } catch (IllegalStateException | IOException e) {
                logger.error(e.getMessage() + e.getStackTrace());
                errorLogService.saveErrorLog("FileUploadController: uploadFileHandler. Error: "+ e.getMessage(), appUserDTO.getUser().getUsrUid());
            }


        List<JSONResult> array = new ArrayList<>();
        array.add(result);
        return array;
    }

Please find below method for processing the file 请找到以下处理文件的方法

    public CsvFileReader(String path, String delimeter, File file) throws FileNotFoundException {
    String line="";
    rows = new ArrayList<>();
    try (BufferedReader br = new BufferedReader(new InputStreamReader(new FileInputStream(path), "UTF8"))) {            
        while ((line = br.readLine()) != null) {

            String[] lineData = line.split(delimeter,-1);
            if(SanityCheck.isValid(lineData)){
                rows.add(lineData);
            }               
        }
    } catch (IOException e) {
        logger.error(e.getMessage());
    }
}

Any one can point me out to the right direction how to solve this please? 任何人都可以指出正确的方向如何解决这个问题?

Your program tries to read the files in UTF-8, therefore the files need to be in UTF-8 and it won't work if they aren't. 您的程序尝试读取UTF-8中的文件,因此文件必须位于UTF-8中,否则将无法正常工作。

If you're asking how to handle files that can be in any encoding, the encoding of a file cannot be guessed, so you need to inform the server of the file's encoding when you upload it, using extra information such as a form field indicating the encoding. 如果您询问如何处理任何编码格式的文件,则无法猜测文件的编码,因此您在上载文件时需要使用格式信息等额外信息告知服务器该文件的编码编码。

If you're asking how to handle files that can be in any encoding, while you don't know how where to obtain from the encoding of a file because the files are just stashed there and you're not aware of any listing of the encoding of each of them, well like I said, it cannot be guessed. 如果您问如何处理可以采用任何编码的文件,而您又不知道如何从文件的编码中获取信息,因为这些文件只是存放在此处,并且您不知道该文件的任何清单就像我说的那样,它们每个的编码都无法猜测。

If you feel like it, you can attempt to guess the encoding of the file, by reading it in UTF-8 first, and checking whether the result contains invalid characters. 如果您愿意,可以尝试通过首先以UTF-8格式读取文件并检查结果是否包含无效字符来猜测文件的编码。 If not, reading it in UTF-8 was most likely correct. 如果不是这样,则以UTF-8读取它很可能是正确的。 If there are invalid characters, then it's probable UTF-8 was not the correct encoding and you should try another. 如果有无效字符,则可能是UTF-8编码不正确,您应该尝试其他编码。 That other encoding may be windows-1252... And it may be something else entirely. 其他编码可能是Windows-1252 ...而且可能完全是其他东西。 No way to know, really. 没办法知道,真的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM