尝试使用iconv将US-ASCII转换为UTF-16LE并获得不期望的输出

Question

I'm trying to convert a file System.Web.WebPages.Razor.dll.refresh from ASCII to UTF-16LE. 我正在尝试将文件System.Web.WebPages.Razor.dll.refresh从ASCII转换为UTF-16LE。 When I run the file -i command on other refresh files in the directory, I get something like: 当我在目录中的其他刷新文件上运行file -i命令时，得到的内容如下：

System.Web.Optimization.dll.refresh: text/plain; charset=utf-16le

And when I run it on my target file I get: 当我在目标文件上运行它时，我得到：

System.Web.WebPages.Razor.dll.refresh: text/plain; charset=us-ascii

I think this encoding difference is causing an error in my build pipeline, so I'm trying to convert this ASCII file to UTF-16LE so it's like the other refresh files. 我认为这种编码差异会导致我的构建管道出现错误，因此我正在尝试将此ASCII文件转换为UTF-16LE，因此就像其他刷新文件一样。 However, iconv doesn't seem to be giving me the output I'm looking for. 但是， iconv似乎并没有提供我想要的输出。

My command: 我的命令：

iconv -f US-ASCII -t UTF-16LE "System.Web.WebPages.Razor.dll.refresh" > "System.Web.WebPages.Razor.dll.refresh.new" && mv -f "System.Web.WebPages.Razor.dll.refresh.new" "System.Web.WebPages.Razor.dll.refresh"

There are two issues with the output. 输出有两个问题。

1) It spaces the file out (ie from this to this ). 1）它将文件隔开（即从this到this ）。

2) When I run file -i on this new file, I get the following output: 2）当我在这个新文件上运行file -i时，得到以下输出：

System.Web.WebPages.Razor.dll.refresh: application/octet-stream; charset=binary

Why am I getting this binary output, and why is it spacing out the text? 为什么得到此二进制输出，为什么将文本隔开？ Is there a better way to convert this file to the proper encoding? 是否有更好的方法将此文件转换为正确的编码？

Answer 1

file is showing your new file as binary data because it relies on a leading Byte Order Mark to tell if the contents are encoded in UTF-16. file将您的新文件显示为二进制数据，因为它依赖于前导的字节顺序标记来判断内容是否以UTF-16编码。 When you specify the endianness, iconv will leave out the BOM: 指定字节顺序时， iconv将忽略BOM表：

$ iconv -f us-ascii -t utf16le <<<test | xxd
00000000: 7400 6500 7300 7400 0a00                 t.e.s.t...

However, if you let it use the native endianness (Which on typical modern hardware is going to be LE 99% of the time): 但是，如果让它使用本地字节序（在典型的现代硬件上，这种情况将达到99％的LE）：

$ iconv -f us-ascii -t utf16 <<<test | xxd
00000000: fffe 7400 6500 7300 7400 0a00            ..t.e.s.t...

the mark is there, and file -i will report it as foo.txt: text/plain; charset=utf-16le 标记在那里， file -i将报告为foo.txt: text/plain; charset=utf-16le foo.txt: text/plain; charset=utf-16le . foo.txt: text/plain; charset=utf-16le 。

I'm not aware of a way to force iconv to always add the BOM with an explicit UTF-16 endianness. 我不知道一种强制iconv始终以显式UTF-16字节序添加BOM的方法。 Instead, here's a perl one-liner that will convert to explicit UTF-16LE and add the BOM: 取而代之的是，这是一个perl线性代码，它将转换为显式的UTF-16LE并添加BOM：

perl -0777 -pe 'BEGIN{binmode STDOUT,":encoding(utf16le)"; print "\x{FEFF}"}' in.txt > out.txt

Or alternatively using printf to print the LE-encoded BOM and iconv for the rest: 或者使用printf打印其余部分的LE编码BOM和iconv ：

(printf "\xFF\xFE"; iconv -f us-ascii -t utf-16le in.txt) > out.txt

尝试使用iconv将US-ASCII转换为UTF-16LE并获得不期望的输出

问题描述

1 个解决方案

解决方案1
2 已采纳 2019-08-19 19:14:22

尝试使用iconv将US-ASCII转换为UTF-16LE并获得不期望的输出

问题描述

1 个解决方案

解决方案1 2 已采纳 2019-08-19 19:14:22

解决方案1
2 已采纳 2019-08-19 19:14:22