简体   繁体   English

使用正则表达式包装句子中的单词

[英]Wrapping words in a sentence using regex

I'm converting sentences like: 我正在转换像这样的句子:

Phasellus turpis, elit. Tempor et lobortis? Venenatis: sed enim!

to: 至:

_________ ______, ____. ______ __ ________? _________: ___ ____!

using: 使用:

utf8_encode(preg_replace("/[^.,:;!?¿¡ ]/", "_", utf8_decode($ss->phrase) ))

But I'm facing a problem: Google is indexing all those empty words as keywords. 但是我面临一个问题:Google正在将所有这些空词都索引为关键字。 I'd like to convert the original strings to something invisible to Google, like: 我想将原始字符串转换为Google不可见的内容,例如:

<span>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;</span> <span>&nbsp;&nbsp;&nbsp;&nbsp</span>, ....   

using: 使用:

.parent span { text-decoration:underline; }

that is, wrapping words inside span tags, replacing words' characters with &nbsp ; 也就是说,将单词包装在span标签内,并用&nbsp;替换单词的字符。 and leaving untouched the special characters .,:;!?¿¡ and space. 并保留特殊字符。,:; !!?¿¡和空格。

Is this possible to solve using a regex? 使用正则表达式可以解决吗? I actually solved this by using a non very efficient loop that scans every character of the string, but I must scan many sentences per page. 实际上,我通过使用非非常有效的循环来解决此问题,该循环扫描字符串的每个字符,但是我必须在每页扫描许多句子。

Use preg_replace_callback and have the callback create the appropriate replacement. 使用preg_replace_callback并让回调创建适当的替换。 Something along the lines of (untested) 与(未试)类似的东西

function replacer($match) {
    return "<span>".str_repeat("&nbsp;",strlen($match[1]))."</span>";
}

// Note the addition of the () and the + near the end of the regex
utf8_encode(preg_replace_callback("/([^.,:;!?¿¡ ]+)/", "replacer", utf8_decode($ss->phrase) ))
$yourphrase = preg_replace('/([^\W]+)/si', '<span>$1</span>', $yourphrase);

this will wrap all the " _ "-words with spans... 这将用跨号包裹所有“ _ ”-词...

imho you need a two-step procedure here, first you have to convert the letters to underscore (which obvious work already?), second you'll have to wrap the " _ "-words in a span (with mine regex). 恕我直言,您需要在此执行两步操作,首先,必须将字母转换为下划线(这显然已经奏效了吗?),其次,您必须将“ _ ”-单词包裹在一个跨距中(使用我的正则表达式)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM