[英]Regex substitution: Replace texts, not codes
我幾天來一直在嘗試解決正則表達式的測驗,但仍然無法做對。 我已經很接近了,但仍然無法通過。
在 HTML 頁面中,將文本
micro
替換為µ
. 哦,不要搞砸代碼:不要在<the tags>
或&entities;
里面替換&entities;
micro
-> µ
abc micro
-> abc µ
micromicro
-> µµ
µmicro
-> µµ
<tag micro />
-> <tag micro />
µ
-> µ
&abcmicro123;
-> &abcmicro123;
我試過這個,但它在最后一個µ
上失敗了µ
, 我錯過了什么? 有人可以指出我錯過了什么嗎? 提前致謝!
((?:\G|\n)(?:.*?&.*?micro.*?;[\s\S]*?|.*?<.*?micro.*?>[\s\S]*?|.)*?)micro
$1µ
你可以嘗試這樣的事情:
(?:<.*?>|&\\w++;)(*SKIP)(*F)|micro
替換字符串:
µ
使用SKIP-FAIL 技術,但作為一個整體匹配:
(?:<[^<>]*>|&\w+;)(*SKIP)(*F)|\bmicro\b
查看證明
解釋
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
< '<'
--------------------------------------------------------------------------------
[^<>]* any character except: '<', '>' (0 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
> '>'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
& '&'
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
; ';'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
(*SKIP)(*F) Skip the match and go on matching from current location
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
micro 'micro'
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
var strings = [ "micro", "abc micro", "micromicro", "µmicro", "<tag micro />", "µ", "&abcmicro123;" ]; var re = /(?<!(<[^>]*|&[^;]*))(micro)/g; strings.forEach(function(str) { var result = str.replace(re, '&$2;') console.log(str + ' -> ' + result) });
控制台日志輸出:
micro -> µ
abc micro -> abc µ
micromicro -> µµ
µmicro -> µµ
<tag micro /> -> <tag micro />
µ -> µ
&abcmicro123; -> &abcmicro123;
解釋:
(?<!...)
- 負向后視排除微內部標簽或實體(<[^>]*|&[^;]*)
- 在負前瞻中跳過<...>
OR '&...;'(micro)
- 捕獲您的標簽(根據需要添加多個,例如(micro|brewery)
)'&$2;'
- 替換將捕獲的標簽變成實體&...;
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.