I'm looking for regexp for my issue. I have a text (specification of a product), for example:
length: 20cm; height: 10cm; «Night» mode: yes; manufacturer : Sony© manual : yes
The final result should look like this
<tr><td>length</td><td>20cm</td></tr>
...
<tr><td>manufacturer</td><td>Sony©</td></tr>
So I should replace ":" + whitespace characters(\\s*)
for "</td><td>"
and ";" + whitespace characters(\\s*)
";" + whitespace characters(\\s*)
for "</td></tr><tr><td>"
, but not in the case where there are latin symbols [az]+
and &
sign before the ;
. The point is in html chars like &_nbsp; &_laquo; &_copy etc. that contains ";"
In other words :\\s*
but not &[az]+[;]
.
How can I do this?
My regexp in smarty looks like this: " |regex_replace:"/[:]\\s*/":""|regex_replace:"/[;]\\s*/":"" " so the only thing is to remove html chars... I tried some combinations with (?!...) but no success I'm looking for something like this RegExp for matching three letters, but not text "BUY"
Use a negative look-behind to find semicolons not part of an encoded character:
(?<!&[a-z]{2})(?<!&[a-z]{3})(?<!&[a-z]{4})(?<!&[a-z]{5});\s*
This regex matches only naked semi colons. Unfortunately, the multiple look-behinds are required so cover all possibilities due to negative look behinds demanding a fixed length expression.
See a live demo of this regex.
If you must use the regular expression , you can step like this:
\\w : ; &
\\w : ; &
©
to @@@copy###
: ;
with <td>
now @@@copy###
to ©
How about:
$str = 'length: 20cm; height: 10cm; «Night» mode: yes; manufacturer : Sony© manual : yes';
$str = preg_replace('#(?!&[a-z]+); #', '</td></tr><tr><td>', $str);
$str = preg_replace('#: #', '</td><td>', $str);
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.