简体   繁体   English

如何在 java 中设置 UTF-8 进行文件上传?

[英]How to set UTF-8 for file upload in java?

I have function to get file upload below:我有 function 来获取以下文件上传:

public static Map<Integer, Map<String, byte[]>> getFiles(IMultipartBody bimp) {
        List<IAttachment> parts = bimp.getAllAttachments();
        Iterator<IAttachment> it = parts.iterator();
        ByteArrayOutputStream baos = null;
        InputStream inputStream = null;
        String fileName = null;
        byte[] bytes = null;

        Map<Integer, Map<String, byte[]>> files = new HashMap<Integer, Map<String, byte[]>>();
        Map<String, String> duplicateFileMap = new HashMap<String, String>();
        int counter = 0;

        while (it.hasNext()) {
            try {
                IAttachment name = (IAttachment) it.next();
                MultivaluedMap<String, String> headers = name.getHeaders();

                if (headers.get("Content-Disposition") != null
                        && !headers.get("Content-Disposition").isEmpty()) {
                    String header = headers.get("Content-Disposition").get(0);
                    String[] dispositions = header.split(";");
                    for (String disposition : dispositions) {
                        if (disposition.indexOf("filename") != -1) {
                            String tmpStr = disposition.substring(
                                    disposition.indexOf("=") + 1,
                                    disposition.length()).replaceAll("\"",
                                    Constant.EMPTY);
                            ByteBuffer byteBuffs = StandardCharsets.UTF_8.encode(tmpStr);
                            fileName = StandardCharsets.UTF_8.decode(byteBuffs).toString();
//                          fileName = new String(tmpStr.getBytes(), Charset.forName("UTF-8"));

                        }
                    }
                }

                inputStream = name.getDataHandler().getInputStream();
                baos = new ByteArrayOutputStream();
                int reads = inputStream.read();
                while (reads != -1) {
                    baos.write(reads);
                    reads = inputStream.read();
                }
                bytes = baos.toByteArray();
                if (bytes == null || bytes.length < 1) {
                    continue;
                }

                Map<String, byte[]> file = new HashMap<String, byte[]>();
                if (fileName != null ){
                    // Fix for firefox, remove '/'
                    if (fileName.startsWith("/")){
                        fileName = fileName.substring(1);
                    }

                    // Fix for IE, remove physical address, only get file name
                    if (fileName.lastIndexOf("\\") != -1 ){
                        fileName = fileName.substring(fileName.lastIndexOf("\\") + 1);
                    }
                }

                String md5 = generateMD5CheckSum(bytes);
                if (duplicateFileMap.containsKey(md5)
                        && duplicateFileMap.get(md5).equalsIgnoreCase(fileName)){
                    continue;
                }
                counter++;
                file.put(fileName, bytes);
                duplicateFileMap.put(md5,fileName);
                files.put(Integer.valueOf(counter), file);

            } catch (IOException e) {
                e.printStackTrace();
                LOGGER.error(e.getMessage());
            } finally {
                try {
                    if (inputStream != null) {
                        inputStream.close();
                    }

                    if (baos != null) {
                        baos.close();
                    }

                } catch (IOException e) {
                    e.printStackTrace();
                    LOGGER.error(e.getMessage());
                }
            }
        }
        return files;
    }

But when I debug with file upload has fileName: ALMS_ขั้นตอนลงทะเบียน.pdf (it is Thai language), the headers of Attachment have below:但是当我使用文件上传进行调试时,文件名:ALMS_ขั้นตอนลงทะเบียน.pdf(它是泰语),附件的标题如下:

{Content-Disposition=[form-data; {内容处置=[表单数据; name="file";名称="文件"; filename="ALMS_ขั้นตà¸à¸™à¸¥à¸‡à¸—ะเบียน.pdf"], Content-Type=[application/pdf], Content-ID=[root.message@cxf.apache.org]}文件名="ALMS_à¸,ั้นตà¸à¸™à¸¥à¸‡à¸—ะเบียน.pdf"] , 内容类型=[应用程序/pdf], 内容 ID=[root.message@cxf.apache.org]}

I think the IMultipartBody is not set UTF-8 before uploaded.我认为 IMultipartBody 在上传之前没有设置 UTF-8 。 Anyone can help me resolve this problem?任何人都可以帮我解决这个问题吗? Thanks.谢谢。

Use of Content-Disposition header is covered by the RFC6266 RFC6266 涵盖了 Content-Disposition header的使用

The filename attribute must be encoded in ISO-8859-1. filename属性必须以 ISO-8859-1 编码。 Other charsets can be supported using the same name attribute followed by an asterisk, filename* , and a URL encoded filename.可以使用相同的名称属性支持其他字符集,后跟星号、 filename*和 URL 编码的文件名。

See the example section 5 of the RFC, for the filename "€ rates" (euro rates) encoded in UTF-8:有关以 UTF-8 编码的文件名“€ rates”(欧元汇率),请参见 RFC 的示例第 5 节:

filename*=UTF-8''%e2%82%ac%20rates

Yes, that's a weird notation, not a typo: the original attribute name followed by an asterisk, and the value starts with the encoding (UTF-8) followed by two quotes, and the filename URL-encoded (note that is path encoding, not parameter encoding: spaces are replaced by %20, not +).是的,这是一个奇怪的符号,而不是拼写错误:原始属性名称后跟一个星号,值以编码 (UTF-8) 开头,后跟两个引号,以及文件名 URL 编码(注意这是路径编码,非参数编码:空格替换为 %20,而不是 +)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM