简体   繁体   中英

Replacing ’ character in PHP

I'm having a hard time trying to replace this weird right single quote character. I'm using str_replace like this:

str_replace("'", '\ሴ', $string);

It looks like I cannot figure out what character the quote really is. Even when I copy paste it directly from PHPMyAdmin it still doesn't work. Do I have to escape it somehow?

The character: http://www.lukomon.com/Afbeelding%204.png

  • MySQL Charset: UTF-8 Unicode (utf8)
  • MySQL Collations: utf8_unicode_ci
  • <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

EDIT: It turned out to be a Microsoft left single quote which I could replace with this function from Phill Paffords comment. Not sure which answer I should mark now..

This had happend to me too. Couple of things:

  • Use htmlentities function for your text

    $my_text = htmlentities($string, ENT_QUOTES, 'UTF-8');

More info about the htmlentities function.

  • Use proper document type, this did the trick for me.

    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

  • Use utf-8 encoding type in your page:

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Here is the final prototype for your page:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<title>Untitled Document</title>
</head>    
<body>

<?php     
    // your code related to database        
    $my_text = htmlentities($string, ENT_QUOTES, 'UTF-8');    
?>

</body>
</html>

.

If you want to replace it however, try the mb_ereg_replace function.

Example:

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");

$my_text = mb_ereg_replace("’","'", $string);

I had the same issue and found this to work:

function replace_rsquote($haystack,$replacewith){
   $pos = strpos($haystack,chr("226"));
   if($pos > -1){
       return substr_replace($haystack,$replacewith,$pos,3);
   } else return $haystack;
}

Example:

echo replace_rsquote("Nick’s","'"); //Nick's

To find what character it is, run it through the ord function, which will give you the ASCII code of the character:

echo ord('’'); // 226

Now that you know what it is, you can do this:

str_replace('’', chr(226), $string);

To replace it:

If your script file is encoded in the same encoding as the data you are trying to do the replacement in, it should work the way you posted it. If you're working with UTF-8 data, make sure the script is encoded in UTF-8 and it's not your editor silently transliterating the character when you paste it.

If it won't work, try escaping it as described below and see what code it returns.

To escape it:

If your source file is encoded in UTF-8, this should work:

$string = htmlentities($string, ENT_QUOTES, "UTF-8");

the default character set of html... is iso-8859-1 . Anything differing from that must be explicitly stated.

For more complex character conversion issues, always check out the User Contributed Notes to functions like htmlentities() , there are often real gems to be found there.

In General:

Bobince is right in his comment, systemic character set problems should be sorted systematically so they don't bite you in the ass - if only by defining which character set is used on every step of the way:

  • How the script file is encoded;
  • How the document is served;
  • How the data is stored in the database;
  • How the database connection is encoded.

If you are using non-ASCII characters in your PHP code, you need to make sure that you're using the same character encoding as in the data you are processing. Your attempt probably fails because you are using a different character encoding in your PHP script than in $string .

Additionally, if you're using a multibyte character encoding such as UTF-8, you should also use the multibyte aware string functions .

Gumbo sad right -
- save your script as utf-8 file
- and use http://php.net/mbstring (as Sarfraz pointed in his last example)

为什么不通过htmlspecialchars()运行字符串并输出它以查看它将该字符转换为什么,因此您知道要使用什么作为替换表达式?

This character you have is the Right Single Quotation Mark .

To replace it with a pattern you'll want to do something like this

$string = preg_replace( "/\\x{2019}/u", 'replacement', $string );

But that really only addresses the symptom. The problem is that you don't have consistent use of character encodings throughout your application, as others have noted.

Don't use any regex functions ( preg_replace or mb_ereg_replace ). They are way to heavy for this.

str_replace(chr(226),'\u2019' , $string);

If your needle is a multibyte character, you may have better luck with this bespoke function:

<?php 
function mb_str_replace($needle, $replacement, $haystack) {
    $needle_len = mb_strlen($needle);
    $replacement_len = mb_strlen($replacement);
    $pos = mb_strpos($haystack, $needle);
    while ($pos !== false)
    {
        $haystack = mb_substr($haystack, 0, $pos) . $replacement
                . mb_substr($haystack, $pos + $needle_len);
        $pos = mb_strpos($haystack, $needle, $pos + $replacement_len);
    }
    return $haystack; 
} 
?>

credit for this last function: http://www.php.net/manual/en/ref.mbstring.php#86120

You can get the char ascii code with ord then replace it with your desired character:

$asciicode = ord('’'); // 146
$stringfixed = str_replace(chr($asciicode), '\'', $string);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM