简体   繁体   English

在MySql数据库中存储序列化对象

[英]Storing a serialized object in MySql database

I have a big php object that I want to serialize and store in a MySql database. 我有一个很大的php对象,我想序列化并存储在MySql数据库中。 The table encoding is UTF-8 and the column to hold the serialized object encoding is also UTF-8 . 表编码是UTF-8 ,用于保存序列化对象编码的列也是UTF-8

The problem is the object holds a text string containing French characters. 问题是该对象包含一个包含法语字符的文本字符串。

For example: 例如:

Merci d'avoir passé commande avec Lovre. Voici le récapitulatif de votre commande 

When I serialize the object then unserialize it again directly the string is maintained and is in correct format. 当我序列化对象然后再次反序列化它时,字符串被维护并且格式正确。

However, when I store the serialized object into a MySql database then retrieve it again then unserialize it the string becomes like this: 但是,当我将序列化对象存储到MySql数据库然后再次检索它然后反序列化它时,字符串变为如下所示:

Merci d'avoir passé commande avec Lovre. Voici le récapitulatif de votre commande 

Something goes wrong when I store the object in the database. 将对象存储在数据库中时出现问题。

Notes: 笔记:

  • The object is stored using propel ORM. 使用推进ORM存储对象。
  • The column type is text . 列类型是text
  • The string is stored and read from a html file. 该字符串存储并从html文件中读取。

The strings created by serialize are binary strings, they don't have a specific charset encoding but are just an "array" of bytes (where-as one byte is 8bit, an octet). serialize创建的字符串是二进制字符串,它们没有特定的字符集编码,但只是字节的“数组”(其中 - 一个字节是8位,一个八位字节)。

If you now take such a string and tell your database that it is LATIN-1 encoded and your database stores it into a text-field with UTF-8 encoding, the database will transparently change the encoding from LATIN-1 into UTF-8. 如果您现在使用这样的字符串并告诉您的数据库它是LATIN-1编码的并且您的数据库将其存储到具有UTF-8编码的文本字段中,则数据库将透明地将编码从LATIN-1更改为UTF-8。 UTF-8 is a charset encoding that uses more than one byte per character for some characters, for example those you give in your question like é . UTF-8是一种字符集编码,对于某些字符,每个字符使用多个字节,例如您在问题中提供的字符,如é

The character é is then stored as é inside the database, which is the UTF-8 byte-sequence for é . 字符é然后被存储为é在数据库内,这对于UTF-8字节序列é

If you now fetch the data from the database without specifying in which encoding you need it, the database will return it as UTF-8. 如果现在从数据库中获取数据而未指定所需的编码,则数据库将以UTF-8的形式返回。

Now unserialize has a problem because the binary string has been modfied in a way which makes it invalid. 现在unserialize有一个问题,因为二进制字符串已被修改为使其无效的方式。

Instead you need to either tell your database that it should not modify the encoding when it stores the serialized string, eg by choosing the right column type and encoding (binary field, BLOB - Binary Large Object MySQL Docs , see as well Binary Types Propel Docs ) -or- when you fetch the data from the database you revert the charset-encoding back to the original format. 相反,您需要告诉您的数据库在存储序列化字符串时不应修改编码,例如通过选择正确的列类型和编码(二进制字段, BLOB - 二进制大对象MySQL文档 ,请参阅二进制类型Propel文档 ) - 或者 - 从数据库中获取数据时,将charset-encoding恢复为原始格式。 The first approach (binary field) is better because it is exactly what you're looking for. 第一种方法(二进制字段)更好,因为它正是您正在寻找的。

For the data that has been already stored into the database in a wrong format, you need to correct the data. 对于已经以错误格式存储到数据库中的数据,您需要更正数据。 To do that you first need to find out which re-encoding was applied, eg from which charset to which charset. 要做到这一点,首先需要找出应用了哪种重新编码,例如从哪个charset到哪个charset。 I assume it's LATIN-1 but there is no guarantee. 我认为它是LATIN-1但是没有保证。 You need to review the encoding of your current application data and processes to find out. 您需要查看当前应用程序数据和进程的编码以查找。

After you've found out, encode the values back from UTF-8 to the original encoding. 找到后,将值从UTF-8编码回原始编码。

make sure to use utf-8 everywhere - sounds like you missed something. 确保在任何地方使用utf-8 - 听起来像你错过了什么。

in your case, i think you've forgotten to set the correct charset for you database-connection (using a SET NAMES statement or mysql_set_charset() ) - but thats hard to say without seeing your code (and i don't know propel). 在你的情况下,我认为你已经忘记为数据库连接设置正确的字符集(使用SET NAMES语句或mysql_set_charset() ) - 但很难说没有看到你的代码(我不知道推进) 。

the following is a quote from chazomaticus , who has given a perfect answer in UTF-8 all the way through , listing all the points you have to take care of: 以下是chazomaticus的引用,他在UTF-8中给出了一个完美的答案,列出了你需要注意的所有要点:

Storage: 存储:

  • Specify utf8_unicode_ci (or equivalent) collation on all tables and text columns in your database. 在数据库的所有表和文本列上指定utf8_unicode_ci (或等效)排序规则。 This makes MySQL physically store and retrieve values natively in UTF-8. 这使得MySQL以UTF-8本地存储和检索值。

Retrieval: 恢复:

  • In PHP, in whatever DB wrapper you use, you'll need to set the connection charset to utf8. 在PHP中,无论您使用什么DB包装器,都需要将连接字符集设置为utf8。 This way, MySQL does no conversion from its native UTF-8 when it hands data off to PHP. 这样,当MySQL将数据移交给PHP时,MySQL不会从其原生UTF-8进行转换。 * Note that if you don't use a DB wrapper, you'll probably have to issue a query to tell MySQL to give you results in UTF-8: SET NAMES 'utf8' (as soon as you connect). *请注意,如果您不使用数据库包装器,您可能必须发出一个查询来告诉MySQL以UTF-8为您提供结果: SET NAMES 'utf8' (一旦连接)。

Delivery: 交货:

  • You've got to tell PHP to deliver the proper headers to the client, so text will be interpreted as UTF-8. 您必须告诉PHP向客户端提供正确的标头,因此文本将被解释为UTF-8。 In PHP, you can use the default_charset php.ini option, or manually issue the Content-Type header yourself, which is just more work but has the same effect. 在PHP中,您可以使用default_charset php.ini选项,或者自己手动发出Content-Type标题,这只是更多工作但具有相同的效果。

Submission: 投稿方式:

  • You want all data sent to you by browsers to be in UTF-8. 您希望浏览器发送给您的所有数据都是UTF-8。 Unfortunately, the only way to reliably do this is add the accept-charset attribute to all your <form> tags: <form ... accept-charset="UTF-8"> . 不幸的是,可靠地执行此操作的唯一方法是将accept-charset属性添加到所有<form>标记: <form ... accept-charset="UTF-8">
  • Note that the W3C HTML spec says that clients "should" default to sending forms back to the server in whatever charset the server served, but this is apparently only a recommendation, hence the need for being explicit on every single <form> tag. 请注意,W3C HTML规范说客户端“应该”默认在服务器所服务的任何字符集中将表单发送回服务器,但这显然只是一个建议,因此需要在每个<form>标记上明确。
  • Although, on that front, you'll still want to verify every submitted string as being valid UTF-8 before you try to store it or use it anywhere. 虽然在这方面,您仍然希望在尝试存储或在任何地方使用它之前,将每个提交的字符串验证为有效的UTF-8。 PHP's mb_check_encoding() does the trick, but you have to use it religiously. PHP的mb_check_encoding()可以解决问题,但你必须虔诚地使用它。

Processing: 处理:

  • This is, unfortunately, the hard part. 不幸的是,这是困难的部分。 You need to make sure that every time you process a UTF-8 string, you do so safely. 您需要确保每次处理UTF-8字符串时都安全地执行此操作。 Easiest way to do this is by making extensive use of PHP's mbstring extension. 最简单的方法是大量使用PHP的mbstring扩展。
  • PHP's string operations are NOT by default UTF-8 safe. PHP的字符串操作默认情况下不是UTF-8安全的。 There are some things you can safely do with normal PHP string operations (like concatenation), but for most things you should use the equivalent mbstring function. 对于普通的PHP字符串操作(如串联),您可以安全地执行某些操作,但对于大多数情况,您应该使用等效的mbstring函数。
  • To know what you're doing (read: not mess it up), you really need to know UTF-8 and how it works on the lowest possible level. 要知道你在做什么(阅读:不要搞砸了),你真的需要知道UTF-8以及它如何在尽可能低的水平上运行。 Check out any of the links from utf8.com for some good resources to learn everything you need to know. 查看来自utf8.com的任何链接,获取一些很好的资源,以了解您需要了解的所有信息。
  • Also, I feel like this should be said somewhere, even though it may seem obvious: every PHP or HTML file you'll be serving should be encoded in valid UTF-8. 此外,我觉得应该在某处说,尽管看起来很明显:你要服务的每个PHP或HTML文件都应该用有效的UTF-8编码。

note that you don't need to use utf-8 - the important part is to use the same charset everywhere , independent of what charset that might be. 请注意,您不需要使用utf-8 - 重要的部分是在任何地方使用相同的字符集 ,而不管可能的字符集。 but if you need to change things anyway, use utf-8. 但如果你还需要改变一些东西,请使用utf-8。

I'm always storing esrialized data via using base64_encode() . 我总是使用base64_encode()存储esrialized数据。 Serialized data is sometimes causing problems, but after using the base64-value of it, only simple characters remain. 序列化数据有时会导致问题,但在使用它的base64值后,只保留简单字符。

I strongly recommend you to use json_encode instead of serialize. 我强烈建议您使用json_encode而不是serialize。 Some day you will find yourself trying to use that data from another place that is not PHP and having it stored in JSON makes it readable everywhere; 有一天,你会发现自己试图从另一个非PHP的地方使用这些数据并将其存储在JSON中,这使得它在任何地方都可读; virtually every language supports decoding JSON and is a well stablished standard. 几乎每种语言都支持解码JSON,并且是一个很好的标准。

The answer about using utf8 everywhere holds! 关于在任何地方使用utf8的答案都有! :-D :-D

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM