简体   繁体   English

如何将任意二进制数据插入VARCHAR列?

[英]How can I insert arbitrary binary data into a VARCHAR column?

I have a MySQL table with a VARCHAR(100) column, using the utf8_general_ci collation. 我有一个使用utf8_general_ci归类的带有VARCHAR(100)列的MySQL表。

I can see rows where this column contains arbitrary byte sequences (ie data that contains invalid UTF8 character sequences), but I can't figure out how to write an UPDATE or INSERT statement that allows this type of data to be entered. 我可以看到该列包含任意字节序列的行(即,包含无效UTF8字符序列的数据),但是我无法弄清楚如何编写允许输入此类数据的UPDATE或INSERT语句。

For example, I've tried the following: 例如,我尝试了以下方法:

UPDATE DataTable SET Data = CAST(BINARY(X'16d7a4fca7442dda3ad93c9a726597e4') AS CHAR(100)) WHERE Id = 1;

But I get the error: 但是我得到了错误:

Incorrect string value: '\xFC\xA7D-\xDA:...' for column 'Data' at row 1

How can I write an INSERT or UPDATE statement that bypasses the destination column's collation, allowing me to insert arbitrary byte sequences? 如何编写一个INSERT或UPDATE语句来绕过目标列的排序规则,从而允许我插入任意字节序列?

Have you considered using one of the Blob data types instead of varchar? 您是否考虑过使用Blob数据类型之一而不是varchar? I believe that this'd take a lot of the pain away from your use-case. 我相信这将减轻您的用例的痛苦。

EDIT: Alternatively, there is the HEX and UNHEX functions, which MySQL supports. 编辑:或者,有MySQL支持的HEX和UNHEX函数。 Hex takes either a str or a numeric argument and returns the hexadecimal representation of your argument as a string. 十六进制采用str或数字参数,并以字符串形式返回参数的十六进制表示形式。 Unhex does the inverse; Unhex做相反的事情; taking a hexadecimal string and returning a binary string. 接受十六进制字符串并返回二进制字符串。

The short answer is that it shouldn't be possible to insert values with invalid UTF8 characters into VARCHAR column declared to use UTF8 characterset. 简短的答案是,不可能将具有无效UTF8字符的值插入声明为使用UTF8字符集的VARCHAR列中。

That's the design goal of MySQL, to disallow invalid values. 这就是MySQL的设计目标,即禁止无效值。 When there's an attempt to do that, MySQL will return either an error or a warning, or (more leniently?) silently truncate the supplied value at the first invalid character encountered. 尝试执行此操作时,MySQL将返回错误或警告,或者(更宽容地?)在遇到的第一个无效字符处默默地截断所提供的值。

The more usual variety of characterset issues are with MySQL performing a characterset conversion when a characterset conversion isn't required. 字符集问题更常见的变化是MySQL在不需要字符集转换时执行字符集转换。

But the issue you are reporting is that invalid characters were inserted into a UTF8 column. 但是,您要报告的问题是无效字符已插入UTF8列。 It's as if a latin1 (ISO-8859) encoding was supplied, and a characterset conversion was required, but was not performed. 好像提供了latin1(ISO-8859)编码,并且需要字符集转换,但是没有执行。

As far as working around that... I believe it was possible in earlier versions of MySQL. 至于解决此问题...我相信在早期版本的MySQL中是可能的。 I believe it was possible to cast a value to BINARY, and then warp that in CONVERT( ... USING UTF8) , and MySQL wouldn't perform a validation of the characterset. 我相信可以将值转换为BINARY,然后在CONVERT( ... USING UTF8)进行扭曲,而MySQL不会对字符集进行验证。 I don't know if that's still possible with the current MySQL Connectors. 我不知道当前的MySQL连接器是否仍然可行。

If it is possible, then that's (IMO) a bug in the Connector. 如果可能的话,那是(IMO)连接器中的错误。

The only way I can think of getting around that characterset check/validation would be to get the MySQL sever to trust the client, and determine that no check of the characterset is required. 我能想到的解决该字符集检查/验证的唯一方法是,使MySQL服务器信任客户端,并确定不需要对该字符集进行检查。 (That would also mean the MySQL server wouldn't be doing a characterset conversion, the client lying to the server, the client telling the server that it's supplying valid UTF8 characters. (这也意味着MySQL服务器不会进行字符集转换,客户端躺在服务器上,客户端告诉服务器它正在提供有效的UTF8字符。

Basically, the client would be telling the server "Hey server, I'm going to be sending UTF8 character encodings". 基本上,客户端将告诉服务器“嘿,服务器,我要发送UTF8字符编码”。

And the server says "Okay. I'll not do any characterset conversion then, since we match. And I'll just trust that what you send is valid UTF8". 服务器说:“好的。因为我们匹配,所以我不会进行任何字符集转换。而且我只相信您发送的内容是有效的UTF8”。

And then the client mischievously chuckles to itself, "Heh, heh, I lied. I'm actually sending character encodings that aren't valid UTF8". 然后客户调皮地嘲笑自己:“嘿,嘿,我撒谎了。我实际上是在发送无效的UTF8字符编码”。

And I think it's much more likely to be able to achieve such mischief using prepared statements with the old school MySQL C API ( mysql_stmt_prepare , mysql_stmt_execute ), supplying nvalid UTF8 encodings as values for string bind parameters. 而且我认为,使用旧式MySQL C API( mysql_stmt_preparemysql_stmt_execute )准备的语句,能够提供有效的UTF8编码作为字符串绑定参数的值,更有可能实现这种恶作剧。 (The onus is really on the client to supply valid values for bind parameters.) (实际上,客户端有责任为绑定参数提供有效值。)

您应该预先对值进行base64编码,以便可以使用它生成有效的SQL:

UPDATE DataTable SET Data = from_base64('mybase64-encoded-representation-of-my-value') WHERE Id = 1;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM