简体   繁体   English

如果在PHP中将UTF-8编码的字符串与ASCII字符串连接起来,结果字符串将是什么编码?

[英]What encoding is the resulting string if I concatenate a UTF-8 encoded string with an ASCII string in PHP?

If I use the function mb_convert_encoding() to convert an ASCII encoded string in PHP to a UTF-8 string, then concatenate it with an ASCII encoded string, what encoding is it? 如果我使用函数mb_convert_encoding()将PHP中的ASCII编码的字符串转换为UTF-8字符串,然后将其与ASCII编码的字符串连接起来,它是什么编码? Are there any negative consequences for doing this? 这样做是否有负面影响?

It would depend firstly on whether you mean strict ASCII , which only includes 128 characters. 首先,这取决于您是否表示仅包含128个字符的严格 ASCII Every single one of these characters has the exact same encoding in the ASCII encoding scheme as it does in the UTF-8 encoding scheme . 这些字符中的每个字符在ASCII编码方案中的编码都与在UTF-8编码方案中的编码完全相同。 For these characters, the mb_convert_encoding function will have no effect. 对于这些字符,mb_convert_encoding函数将无效。 You can easily verify this yourself with this script: 您可以使用以下脚本轻松地对此进行验证:

/* Convert ASCII to UTF-8 */
for ($i=0; $i<128; $i++) {
        $str1 = chr($i);
        $str2 = mb_convert_encoding($str1, "UTF-8", "ASCII");

        echo $str1 . " - " . $str2 . " - ";

        if ($str1 !== $str2) {
                echo " - DIFFERENT!";
        } else {
                echo " - same";
        }
        echo "\n";
}

For all of these true ASCII characters, there's no point in transcoding them. 对于所有这些真正的 ASCII字符,没有必要对其进行代码转换。

HOWEVER , if by "ASCII" you mean extended ASCII (see here ) and are talking about characters with accents and stuff, then you are getting into trouble because there is no definitive character set described by this term . 但是 ,如果用“ ASCII”来表示扩展的ASCII (请参阅此处 ),并且谈论带有重音符号和东西的字符,则您会遇到麻烦,因为该术语没有确定的字符集 You'll notice that in the list of supported character encodings for php's Multibyte String extension there is only one occurrence of the acronym ASCII and that is for ASCII itself. 您会注意到,在php的Multibyte String扩展名的支持字符编码列表中,仅出现了ASCII的首字母缩写词,这是ASCII本身的缩写。

To answer your questions more precisely: 为了更精确地回答您的问题:

If I use the function mb_convert_encoding() to convert an ASCII encoded string in PHP to a UTF-8 string, then concatenate it with an ASCII encoded string, what encoding is it? 如果我使用函数mb_convert_encoding()将PHP中的ASCII编码的字符串转换为UTF-8字符串,然后将其与ASCII编码的字符串连接起来,它是什么编码?

The resulting string is both ASCII and UTF-8 because both encoding schemes use identical byte encodings for those 128 characters. 生成的字符串是ASCII UTF-8,因为这两个编码方案用于那些128个字符相同的字节编码。

Are there any negative consequences for doing this? 这样做是否有负面影响?

There should be no negative consequences under any circumstance if the characters are in fact true ASCII characters. 如果字符实际上是真正的 ASCII字符,则在任何情况下都不应有负面影响。

If, on the other hand, the strings include some accented character like Å or õ and some sloppy coder is calling this "extended ASCII" then you might have problems. 另一方面,如果字符串中包含一些重音字符(如Åõ),并且某些草率的编码器将其称为“扩展的ASCII”,那么您可能会遇到问题。 Those characters have different encodings in the latin-1 and UTF-8 encoding schemes, for instance. 例如,这些字符在latin-1和UTF-8编码方案中具有不同的编码。

Consider taking a peek at this php function and it may shake loose some understanding. 考虑偷看这个php函数,它可能会失去一些理解。 Ask yourself what it means to convert a character which is NOT ASCII from ASCII to UTF-8 . 问问自己, 将不是ASCII的字符从ASCII转换为UTF-8意味着什么。 It is not a meaningful conversion but it does result in a change in this particular script: 这不是有意义的转换,但确实会导致此特定脚本的更改:

$chars = array("Å", "õ");
foreach ($chars as $char) {
        echo $char . " : ";
        $str1 = mb_convert_encoding($str1, "UTF-8", "ASCII");
        $str2 = mb_convert_encoding($str1, "UTF-8", "ISO-8859-1");
        echo $str1 . " - " . $str2 . " - ";

        if ($char !== $str1) {
                echo " - ASCII DIFFERENT";
        }
        if ($char !== $str2) {
                echo " - LATIN 1 DIFFERENT";
        }
        echo "\n";
}

You might start to get confused at this point. 此时您可能会开始感到困惑。 It might help for you to know that my PHP code in that last function has its own character encoding which on my workstation happens to be utf-8. 这可能会帮助您了解我在最后一个函数中的PHP代码具有自己的字符编码该字符编码在我的工作站上恰好是utf-8。 These transformations I've performed are therefore pretty stupid. 因此,我执行的这些转换非常愚蠢。 I'm lying to PHP, saying that these UTF-8 strings are ASCII or Latin-1 and asking PHP to transform them to UTF-8. 我对PHP撒谎,说这些UTF-8字符串是ASCII或Latin-1,并要求PHP将它们转换为UTF-8。 It performs a transformation as best it can but we all know that transformation isn't meaningful. 它尽最大可能执行转换,但是我们都知道转换没有意义。

I hope you can appreciate what I'm getting at here. 希望您能体谅我在这里得到的一切。 Every time you see a character on a computer, it has some encoding. 每次您在计算机上看到一个字符时,它都有一些编码。 Whether or not there are any negative consequences will depend on how you treat the data that comes to you, what transformations you perform on it, and what you intend to do with it later. 是否存在任何负面后果将取决于您如何对待收到的数据,对数据执行的转换以及以后打算如何处理。

It's helpful to think of a chain of custody. 考虑一下监管链会很有帮助。 Where did your data come from? 您的数据来自哪里? What encoding did they use? 他们使用什么编码? Is that what I'm using on my system? 那是我在系统上使用的吗? Where am I sending this data? 我要在哪里发送这些数据? Does it need to be converted? 是否需要转换? You should also be careful to specify character sets for all these things: 您还应该谨慎指定所有这些字符集:

  • data you receive from clients 您从客户那里收到的数据
  • form submissions to your website 表单提交到您的网站
  • display of html on your website 在您的网站上显示html
  • operations on text strings in your applications 在应用程序中对文本字符串进行操作
  • character encoding of your connection to a database, character encoding of the tables in your db and encodings of the columns in those tables 与数据库的连接的字符编码,数据库中的表的字符编码以及这些表中的列的编码
  • character encoding of stored data 存储数据的字符编码
  • email character encoding 电子邮件字符编码
  • character encoding of data submitted to an API 提交给API的数据的字符编码

And so on. 等等。

General rule of thumb: use utf-8 for everything you possibly can. 一般经验法则:尽可能使用utf-8。

ASCII is a subset of UTF-8, so an ASCII string is a valid UTF-8 string. ASCII是UTF-8的子集,因此ASCII字符串是有效的UTF-8字符串。 Concatenating two UTF-8 strings is unambiguous. 连接两个UTF-8字符串是明确的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM