简体   繁体   English

UTF-8 中的 ArgumentError 无效字节序列

[英]ArgumentError invalid byte sequence in UTF-8

What I want to solve我想解决的问题

The following error occurs when downloading that file, compressed into a single zip file.下载该文件时发生以下错误,压缩成单个 zip 文件。

invalid byte sequence in UTF-8 . invalid byte sequence in UTF-8

For this error, I have to remove illegal characters as UTF-8 from the string, so I used encode method to convert from UTF-8 to UTF-8, but the string I want to display is not displayed.对于这个错误,我必须从字符串中删除非法字符UTF-8,所以我使用encode方法将UTF-8转换为UTF-8,但是我要显示的字符串没有显示。 It looks like the image.它看起来像图像。

file_name.encode,("UTF-8", "UTF-8": invalid: :replace)

在此处输入图像描述

Is there any solution to this problem?这个问题有什么解决办法吗?

I would be glad to know.我很高兴知道。

source code源代码

        Zip::File.open_buffer(obj) do |zip|

          zip.each do |entry|
            ext = File.extname(entry.name)
            file_name = File.basename(entry.name)

            # file_name.encode!("UTF-8", "UTF-8", invalid: :replace)

            next if ext.blank? || file_name.count(".") > 1

            dir = File.join(dir_name, File.dirname(entry.name))

            FileUtils.mkpath(dir.to_s)      

            zip.extract(entry, dir + ".txt" || ".jpg" || ".csv") {true}

            file_name.force_encoding("UTF-8")
            new_file_name = "#{dir_name}/#{file_name}"

            new_file_name.force_encoding("UTF-8")
            File.rename(dir + ".txt" || ".jpg" || ".csv", new_file_name)

            @input_dir << new_file_name
          end
        end
        
        Zip::OutputStream.open(zip_file.path) do |zip_data|
          @input_dir.each do |file|
          zip_data.put_next_entry(file)
          zip_data.write(File.read(file.to_s))
          end
        end

environment环境

mac OS Catarina 10.15.7 ruby "2.6.3" mac OS 卡塔琳娜 10.15.7 ruby“2.6.3”

You get these errors because the Zip gem assumes the filenames to be encoded in UTF-8 but they are actually in a different encoding.您会收到这些错误,因为 Zip gem 假定文件名以 UTF-8 编码,但它们实际上采用不同的编码。

To fix the error, you first have to find the correct encoding.要修复错误,您首先必须找到正确的编码。 Let's re-create the string from its bytes:让我们从它的字节重新创建字符串:

bytes = [111, 117, 116, 112, 117, 116, 50, 48, 50, 48, 49,
         50, 48, 55, 95, 49, 52, 49, 54, 48, 50, 47, 87,
         78, 83, 95, 85, 80, 151, 112, 131, 102, 129, 91,
         131, 94, 46, 116, 120, 116]

string = bytes.pack('c*')
#=> "output20201207_141602/WNS_UP\x97p\x83f\x81[\x83^.txt"

We can now traverse the Encoding.list and select those that return the expected result :我们现在可以遍历返回预期结果Encoding.listselect

Encoding.list.select do |enc|
  s = string.encode('UTF-8', enc) rescue next
  s.end_with?('WNS_UP用データ.txt')
end
#=> [
#     #<Encoding:Windows-31J>,
#     #<Encoding:Shift_JIS>,
#     #<Encoding:SJIS-DoCoMo>,
#     #<Encoding:SJIS-KDDI>,
#     #<Encoding:SJIS-SoftBank>
#   ]

All of the above encodings result in the correct output.所有上述编码都会产生正确的 output。

Back to your code, you could use:回到您的代码,您可以使用:

path = entry.name.encode('UTF-8', 'Windows-31J')
#=> "output20201207_141602/WNS_UP用データ.txt"

ext = File.extname(path)
#=> ".txt"

file_name = File.basename(path)
#=> "WNS_UP用データ.txt"

The Zip gem also has an option to set an explicit encoding for non-ASCII file names . Zip gem 还具有为非 ASCII 文件名设置显式编码的选项。 You might want to give it a try by setting Zip.force_entry_names_encoding = 'Windows-31J' (haven't tried it)您可能想通过设置Zip.force_entry_names_encoding = 'Windows-31J'来尝试一下(还没有尝试过)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM