简体   繁体   中英

Wrapping words in a sentence using regex

I'm converting sentences like:

Phasellus turpis, elit. Tempor et lobortis? Venenatis: sed enim!

to:

_________ ______, ____. ______ __ ________? _________: ___ ____!

using:

utf8_encode(preg_replace("/[^.,:;!?¿¡ ]/", "_", utf8_decode($ss->phrase) ))

But I'm facing a problem: Google is indexing all those empty words as keywords. I'd like to convert the original strings to something invisible to Google, like:

<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span> <span>&nbsp;&nbsp;&nbsp;&nbsp</span>, ....   

using:

.parent span { text-decoration:underline; }

that is, wrapping words inside span tags, replacing words' characters with &nbsp ; and leaving untouched the special characters .,:;!?¿¡ and space.

Is this possible to solve using a regex? I actually solved this by using a non very efficient loop that scans every character of the string, but I must scan many sentences per page.

Use preg_replace_callback and have the callback create the appropriate replacement. Something along the lines of (untested)

function replacer($match) {
    return "<span>".str_repeat("&nbsp;",strlen($match[1]))."</span>";
}

// Note the addition of the () and the + near the end of the regex
utf8_encode(preg_replace_callback("/([^.,:;!?¿¡ ]+)/", "replacer", utf8_decode($ss->phrase) ))
$yourphrase = preg_replace('/([^\W]+)/si', '<span>$1</span>', $yourphrase);

this will wrap all the " _ "-words with spans...

imho you need a two-step procedure here, first you have to convert the letters to underscore (which obvious work already?), second you'll have to wrap the " _ "-words in a span (with mine regex).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM