简体   繁体   中英

str_replace() not working for the following case

I would like to use str_replace() to place span elements around html strings for the purpose of highlighting them.

However the following does not work when there is   inside the string. I've tried replacing the   with ' ' but this did not help.


LIVE example

You can recreate the problem using the below code:

$str_to_replace = "as a way to incentivize more purchases.";

$replacement = "<span class='highlighter'>as a way to incentivize&nbsp;more purchases.</span>";

$subject = file_get_contents("http://venturebeat.com/2015/11/10/sources-classpass-raises-30-million-from-google-ventures-and-others/");

$output = str_replace($str_to_replace,$replacement,$subject);

.highlighter{
    background-collor: yellow;
}

So I tried your code and ran into the same problem you did. Interesting, right? The problem is that there's actually another character inbetween the "e" in "incentivize" and the " more", you can see it if you do this, split $subject into two parts, preceding the text to incentivize and after:

// splits the webpage into two parts
$x = explode('to incentivize', $subject);

// print the char code for the first character of the second string
// (the character right after the second e in incentivize) and also
// print the rest of the webpage following this mystery character
exit("keycode of invisible character: " . ord($x[1]) . " " . $x[1]);

which prints: keycode of invisible character: 194 Â more ... , look! There's our mystery character, and it has charcode 194!

Perhaps this website embeds these characters to make it difficult to do exactly what you're doing, or perhaps it's just a bug. In any case, you can use preg_replace instead of str_replace and change $str_to_replace like so:

$str_to_replace = "/as a way to incentivize(.*?)more purchases/";

$replacement = "<span class='highlighter'>as a way to incentivize more purchases.</span>";

$subject = file_get_contents("http://venturebeat.com/2015/11/10/sources-classpass-raises-30-million-from-google-ventures-and-others/");

$output = preg_replace($str_to_replace,$replacement,$subject);

and now this does what you want. The (.*?) handles the mysterious hidden character. You can probably shrink this regular expression even further or at least cap it at a maximum amount of characters ([.]{0,5}) but in either case you likely want to stay flexible.

You can do this a much simpler way with this:

$subject = str_replace("\xc2\xa0", " ", $subject);

Which will replace all &nbsp; characters with a standard space.

You can now continue with your code, but replace all your &nbsp; with a regular space

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM