简体繁体 English

特殊字符在MySQL导出/导入中丢失

[英]Special characters get lost in MySQL export/import

原文 2012-06-25 07:47:06 4 1 mysql/ utf-8/ character-encoding/ latin1

I'm trying to move a MySQL 3.23.58 database to a different server running 5.5.19. 我正在尝试将MySQL 3.23.58数据库移至运行5.5.19的其他服务器。

The old one has latin1 encoding specified, and as far as I can tell the underlying data is indeed honestly latin1. 老版本指定了latin1编码，据我所知，底层数据确实是latin1。 I have tried many things, chiefly: 我尝试了很多事情，主要是：

exporting from terminal with mysqldump and the latin1 encoding flag. 从终端使用mysqldump和latin1编码标志导出。
editing in vim to change "TYPE=InnoDB" to "ENGINE=InnoDB" for MySQL 5 compatibility. 在vim中进行编辑，以将“ TYPE = InnoDB”更改为“ ENGINE = InnoDB”，以实现MySQL 5兼容性。
importing to the new server from terminal. 从终端导入到新服务器。

Browsing the old server (in Sequel Pro for Mac, or MySQL Query Browser on PC), special characters don't always show properly, but they're there (looking at the binary in hex). 浏览旧服务器（在Mac上的Sequel Pro中，或在PC上的MySQL查询浏览器中），特殊字符不一定总是正确显示，但是它们在那里（以十六进制形式查看二进制文件）。 (And in any case it works with the PHP web app.) （在任何情况下，它都可以与PHP Web应用程序一起使用。）

Browsing the new server, all special characters appear to have been replaced by question marks. 浏览新服务器时，所有特殊字符似乎已被问号替换。 I know that sometimes special characters can display as a question mark (or ) if the wrong encoding is specified. 我知道，如果指定了错误的编码，有时特殊字符可能会显示为问号（或）。 But these appear to be genuine straight-up encoded ASCII question marks on a binary level. 但是这些似乎是在二进制级别上真正的经过直接编码的ASCII问号。 The special characters (chiefly curly quotation marks and dashes) appear to have been lost, or destroyed, in the export/import. 特殊字符（主要是引号和破折号）似乎在导出/导入中丢失或损坏。

Any idea why? 知道为什么吗？

I know there are many things that can go wrong with encoding, with many different things at fault. 我知道编码有很多地方可能出错，许多不同的地方也有错。 I have been reading about this for several days (here and elsewhere) and tried setting all the right character encodings, tried UTF-8, tried casting and converting, tried Sequel Pro's export/import (as opposed to the terminal), etc. But I am stumped. 我已经（在这里和其他地方）读了几天的文章，并尝试设置所有正确的字符编码，尝试了UTF-8，尝试了转换和转换，尝试了Sequel Pro的导出/导入（相对于终端），等等。但是我感到难过。

1 个解决方案

Good, it looks like we've narrowed down your problem. 很好，看来我们已经缩小了您的问题范围。 I found this post 我发现了这篇文章

If your text editor is vim, then most likely the "<92>" is the hexadecimal code of an extended ASCII character. 如果您的文本编辑器是vim，那么“ <92>”很可能是扩展ASCII字符的十六进制代码。 In this case, it is Hex(92) or Oct(222) or Dec(146) , which is "right single quotation mark"; 在这种情况下，它是Hex（92）或Oct（222）或Dec（146），它是“正确的单引号”； not to confused with "single quote" which is ASCII Dec code 39. 不要与ASCII十进制代码39的“单引号”混淆。

One way to remove all non-ASCII characters from your file could be - 从文件中删除所有非ASCII字符的一种方法是-

perl -plne 's/[^[:ascii:]]//g' <your_file>

Otherwise just search and replace "<92>" and "<97>" in your exported file with an appropriate character. 否则，只需搜索并用适当的字符替换导出文件中的“ <92>”和“ <97>”。

[Edit] [编辑]

I'm not a VIM user but this post addresses the issue of replacing the <92> smart quote characters 我不是VIM用户，但此帖子解决了替换<92>智能引号字符的问题

For each value that you see in your file, just do a string substitution, like so: 对于文件中看到的每个值，只需进行字符串替换即可，如下所示：

:%s/<93>/\\'/g

of course, you can't just type that <93> in there, so to get it in there you use 当然，您不能只在其中键入<93>，因此您可以使用

CTRL-V x 93 CTRL-V x 93

which inserts hex 93 in place. 将十六进制93插入到位。

In recently exported CSV's from excel, I've seen hex 91-97. 在最近从excel导出的CSV中，我已经看到了十六进制91-97。