简体   繁体   English

数据库中的口音

[英]Accents in the database

I am creating a database with MySql. 我正在使用MySql创建数据库。 I use collation utf8. 我使用归类utf8。 I use a European language that has accents and special characters like ç. 我使用带有重音和特殊字符(如ç)的欧洲语言。

What is the best way to store text in the database, with or without special characters? 有或没有特殊字符的在数据库中存储文本的最佳方法是什么? For example, should I use différent or diffdifférent (different in French) in the database? 例如,我是否应该在数据库中使用différent或diffdifférent (法语不同)? (This means, I should convert with htmlspecialcharts before or after I store the text in the database?) (这意味着,在将文本存储到数据库之前还是之后,我应该使用htmlspecialcharts进行转换?)

I tried and both ways work well. 我尝试了,两种方法都运作良好。 But is there any reason that makes an option more recommended for any technical reason or any option is ok. 但是,出于任何技术原因,是否有任何原因使某个选项更值得推荐,或者任何选项都可以。 I want to be sure now that I begin the database. 现在,我要确保开始数据库。 Later it will be harder to change. 以后将很难更改。

I think you should definitely NOT replace your characters with HTML entities: that is a standard for XML, not for everything! 我认为您绝对应该用HTML实体替换字符:这是XML的标准,而不是所有内容的标准!

For instance, if you had to serve JSON for some reason, you would then be forced to XML-decode your text, then serve it as JSON, where UTF-8 characters are encoded in a different way. 例如,如果由于某种原因必须提供JSON,那么您将被迫对文本进行XML解码,然后将其用作JSON,其中以不同的方式对UTF-8字符进行编码。

Also, converting characters would make your stored strings much less human-readable (thus less human-searchable): Le premier écoquartier d'Île-de-France a été inauguré would be encoded into something absolutely devilish. 另外,转换字符会使您存储的字符串更不易被人Le premier écoquartier d'Île-de-France a été inauguré (因此Le premier écoquartier d'Île-de-France a été inauguré不易被人搜索): Le premier écoquartier d'Île-de-France a été inauguré将被编码成绝对是恶魔Le premier écoquartier d'Île-de-France a été inauguré东西。

Let your MySQL do the hard job, taking care of non-ASCII characters. 让您的MySQL努力工作,注意非ASCII字符。

Two subjects here. 这里有两个主题。

  1. Is necessary/useful to restrict yourself to 7-bit US-ASCII in an application that's powered by UTF-8 and needs characters outside US-ASCII? 在由UTF-8提供支持并且需要US-ASCII之外的字符的应用程序中是否有必要/有用以将自己限制为7位US-ASCII? It's certainly not necessary and I can't imagine a single reason to. 当然没有必要,而且我无法想象有任何理由。 It's like saving your videos as uncompressed WAV. 这就像将您的视频另存为未压缩的WAV。 In most contexts, 8 bits are here to stay. 在大多数情况下,这里保留8位。

  2. Is necessary/useful to convert your plain text to HTML in order to store it? 将您的纯文本格式转换为HTML以存储它是否必要/有用? You obviously don't need to do it, I can't think of a single benefit and you force yourself into the additional burden of bogus encoding/decoding for any single task like eg searching. 您显然不需要这样做,我想不出一个单一的好处,并且您会为任何单个任务(例如搜索)而陷入假冒编码/解码的额外负担。 HTML is not everything. HTML并不是全部。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM