简体   繁体   English

PHP utf8编码和解码

[英]PHP utf8 encoding and decoding

I have the following code in php 我在PHP中有以下代码

$test = "\151\163\142\156";
echo utf8_decode($test);
var_dump($test);

and i get the following result: 我得到以下结果:

isbn
string(4) "isbn"

I get some text from a txt file that has the \\151\\163\\142\\156 text 我从具有\\ 151 \\ 163 \\ 142 \\ 156文本的txt文件中获取了一些文本

$all_text = file_get_contents('test.txt');
var_dump($all_text);

result: 结果:

string(16) "\151\163\142\156"

I have the following questions: 我有以下问题:

  1. how can i utf8 decode the second text so i get the isbn result? 我如何utf8解码第二个文本,以便得到isbn结果?

  2. how can i encode the isbn to get \\151\\163\\142\\156 ? 我如何编码isbn以获得\\ 151 \\ 163 \\ 142 \\ 156?

EDIT 编辑

(from comments) (来自评论)

I tried everything with iconv and encode but nothing worked. 我尝试使用iconv进行所有操作并进行编码,但是没有任何效果。 The text from the .txt file is string(16) and not string(4) so i can encode it. .txt文件中的文本是string(16)而不是string(4),因此我可以对其进行编码。 The txt file is saved from sublime with Western (ISO 8859-1) encoding txt文件使用Western(ISO 8859-1)编码从崇高状态保存

Try using stripcslashes : 尝试使用stripcslashes

<?php

$test = "\151\163\142\156";
echo utf8_decode( $test );                         // "isbn"
var_dump( $test );

echo "<br/><br/><br/>";

$all_text = file_get_contents( "test.txt" );
echo utf8_decode( $all_text ) .                    // "\151\163\142\156"
     "<br/>" .
     utf8_decode( stripcslashes( $all_text ) );    // "isbn"
var_dump( stripcslashes( $all_text ) );

?>

Tested with this file : 用这个文件测试:

This is some text : 这是一些文字:

\\151\\163\\142\\156 \\ 151 \\ 163 \\ 142 \\ 156

And this is more text!!! 这是更多文字!!!

Next is how to convert chars to codes : 接下来是如何将字符转换为代码:

<?php
$test = "isbn";
$coded = "";
for ( $i = 0; $i < strlen( $test ); $i++ ) // PROCESS EACH CHAR IN STRING.
  $coded .= "\\" . decoct( ord( $test[ $i ] ) ); // CHAR CODE TO OCTAL.

echo $coded .                           // "\151\163\142\156"
     "<br/>" .
     stripcslashes( $coded );           // "isbn".
?>

Let's make it more general with a function that we can call anywhere : 让我们通过可以在任何地方调用的函数使它更通用:

<?php
function code_string ( $s )
{ $coded = "";
  for ( $i = 0; $i < strlen( $s ); $i++ )
    $coded .= "\\" . decoct( ord( $s[ $i ] ) );
  return $coded;
}

$x = code_string( "isbn" );
echo $x .                           // "\151\163\142\156"
     "<br/>" .
     stripcslashes( $x );           // "isbn".
?>

This has absolutely nothing to do with UTF-8 encoding. 这与UTF-8编码完全无关。 Forget about that part entirely. 完全忘记那部分。 utf8_decode doesn't do anything in your code. utf8_decode在您的代码中不执行任何操作。 iconv is entirely unrelated. iconv完全无关。

It has to do with PHP string literal interpretation . 它与PHP字符串文字解释有关 The \\... in "\\151\\163\\142\\156" is a special PHP string literal escape sequence: "\\151\\163\\142\\156"\\...是一个特殊的PHP字符串文字转义序列:

\\[0-7]{1,3}
the sequence of characters matching the regular expression is a character in octal notation, which silently overflows to fit in a byte (eg "\\400" === "\\000") 匹配正则表达式的字符序列是八进制表示形式的字符,它会无提示地溢出以适合一个字节(例如,“ \\ 400” ===“ \\ 000”)

http://php.net/manual/en/language.types.string.php#language.types.string.syntax.double http://php.net/manual/en/language.types.string.php#language.types.string.syntax.double

Which very easily explains why it works when written in a PHP string literal, and doesn't work when reading from an outside source (because the external text read through file_get_contents is not being interpreted as PHP code). 这很容易解释为什么它用PHP字符串文字编写时有效,而当从外部源读取时却无效(因为通过file_get_contents读取的外部文本不会被解释为PHP代码)。 Simply do echo "\\151\\163\\142\\156" and you'll see "isbn" without any other conversions necessary. 只需echo "\\151\\163\\142\\156" ,您将看到“ isbn”,而无需进行任何其他转换。

To manually convert the individual escape sequences in the string \\151\\163\\142\\156 to their character equivalents (really: their byte equivalents): 要将字符串\\151\\163\\142\\156的各个转义序列手动转换为它们的等效字符(实际上是它们的等效字节):

$string = '\151\163\142\156';  // note: single quotes cause no iterpretation
echo preg_replace_callback('/\\\\([0-7]{1,3})/', function ($m) {
    return chr(octdec($m[1]));
}, $string)
// isbn

stripcslashes happens to include this functionality, but it also does a whole lot of other things which may be undesired. stripcslashes碰巧包括此功能,但是它还完成了很多其他不希望的事情。

The other way around: 另一种方式:

$string = 'isbn';
preg_replace_callback('/./', function ($m) {
    return '\\' . decoct(ord($m[0]));
}, $string)
// \151\163\142\156

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM