简体   繁体   中英

replace semicolon (;) but not html characters (  etc.)

I'm looking for regexp for my issue. I have a text (specification of a product), for example:

length: 20cm; height: 10cm; «Night» mode: yes; manufacturer : Sony© manual : yes

The final result should look like this

<tr><td>length</td><td>20cm</td></tr>
...
<tr><td>manufacturer</td><td>Sony&copy;</td></tr>

So I should replace ":" + whitespace characters(\\s*) for "</td><td>" and ";" + whitespace characters(\\s*) ";" + whitespace characters(\\s*) for "</td></tr><tr><td>" , but not in the case where there are latin symbols [az]+ and & sign before the ; . The point is in html chars like &_nbsp; &_laquo; &_copy etc. that contains ";"

In other words :\\s* but not &[az]+[;] .

How can I do this?

My regexp in smarty looks like this: " |regex_replace:"/[:]\\s*/":""|regex_replace:"/[;]\\s*/":"" " so the only thing is to remove html chars... I tried some combinations with (?!...) but no success I'm looking for something like this RegExp for matching three letters, but not text "BUY"

Use a negative look-behind to find semicolons not part of an encoded character:

(?<!&[a-z]{2})(?<!&[a-z]{3})(?<!&[a-z]{4})(?<!&[a-z]{5});\s*

This regex matches only naked semi colons. Unfortunately, the multiple look-behinds are required so cover all possibilities due to negative look behinds demanding a fixed length expression.

See a live demo of this regex.

If you must use the regular expression , you can step like this:

  1. remove all character, except \\w : ; & \\w : ; &
  2. replace all &copy; to @@@copy###
  3. you can replace the : ; with <td> now
  4. replace all @@@copy### to &copy;
  5. remove all &nbsp;

How about:

$str = 'length: 20cm; height: 10cm; &laquo;Night&raquo; mode: yes;&nbsp;manufacturer : Sony&copy; manual&nbsp;:&nbsp;yes';
$str = preg_replace('#(?!&[a-z]+); #', '</td></tr><tr><td>', $str);
$str = preg_replace('#: #', '</td><td>', $str);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM