将重整字符转换回 UTF-8

Question

Here is what I did:这是我所做的：

I dumped a SQLite database with UTF-8 data ( sqlite3 example.db .dump > dump.sql ), but since this was in powershell, I assume the piping converted it to windows-1252我转储了一个带有 UTF-8 数据的 SQLite 数据库（ sqlite3 example.db .dump > dump.sql ），但由于这是在 powershell 中，我假设管道将其转换为 windows-1252
I loaded that dumped data into a new database, again using powershell ( Get-Content dump.sql | sqlite3 example2.db )我再次使用 powershell ( Get-Content dump.sql | sqlite3 example2.db ) 将转储的数据加载到新数据库中
I dumped that new database and am left with a new .sql file (this time it was not through powershell - so I assume it was unmodified)我转储了那个新数据库并留下了一个新的.sql文件（这次它不是通过 powershell - 所以我认为它没有被修改）

This new sql file's UTF-8 characters are seriously mangled, and I was wondering if there was a way to convert it back into correct UTF-8.这个新的 sql 文件的 UTF-8 字符严重损坏，我想知道是否有办法将它转换回正确的 UTF-8。

As a few examples, here are what some sequences are in the new file, and what they should be (all are viewed as UTF-8):举几个例子，这里是新文件中的一些序列，以及它们应该是什么（都被视为 UTF-8）：

ÒüéÒü¬ÒüƒÒü½ should beあなたにÒüéÒü¬ÒüƒÒü½应该是あなたに
´╝ü should be a full width exclamation mark ´╝ü应该是全角感叹号
Òé¡Òé╗Òé¡ should beキセキÒé¡Òé╗Òé¡应该是キセキ

Does anyone have any idea as to how I might undo this mangling?有没有人知道我可以如何撤消这种破坏？ Any method would be very helpful!任何方法都会非常有帮助！

This is in powershell 7.0.1这是在 PowerShell 7.0.1

Edit:编辑：

On further inspection, you can duplicate my predicament by redirecting any such data to a file in powershell (note that the data cannot itself be entered in powershell).在进一步检查时，您可以通过将任何此类数据重定向到 powershell 中的文件来复制我的困境（请注意，数据本身不能在 powershell 中输入）。 Hence, setting up a script like this gives the same outcome:因此，设置这样的脚本会产生相同的结果：

test.sh测试文件

#!/bin/bash
echo "キ"

And then running wsl ./test.sh > test.txt will give an output of Òé¡ , notキ然后运行wsl ./test.sh > test.txt将给出Òé¡的输出，而不是キ

Edit 2:编辑2：

It seems as if the codepage the UTF-8 text was converted to is almost 437: some characters are restored using this assumption (eg木), but others are not.似乎 UTF-8 文本转换成的代码页几乎是 437：使用此假设（例如木）恢复了一些字符，但其他字符则不然。 If it's close to 437, but isn't, what could it be?如果它接近 437，但不是，那可能是什么？

Answer 1

事实证明，因为我在英国，所以我想要的代码页是 850。将文件保存为 850，然后将其重新加载为 UTF-8 解决了我的问题！

将重整字符转换回 UTF-8

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-08-28 00:27:49

将重整字符转换回 UTF-8

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-08-28 00:27:49

解决方案1
0 已采纳 2020-08-28 00:27:49