简体   繁体   English

Notepad ++中的正则表达式可替换CDATA中的<和>

[英]Regular Expression in Notepad++ to replace < and > inside CDATA

I'm using Notepad++ to fix a huge XML export file and one of the challenges here is to replace all < and > characters to &lt; 我正在使用Notepad ++修复巨大的XML导出文件,这里的挑战之一是将所有<>字符替换为&lt; and &gt; &gt; . The thing is, I can't simply use the replace all action since the XML file is full of < and > that cannot be changed. 事实是,我不能简单地使用全部替换操作,因为XML文件充满了<和>,并且无法更改。

Luckly all the < and > that I need to change are wrapped by CDATA tags, like this: 幸运的是,我需要更改的所有<>都由CDATA标记包装,如下所示:

<![CDATA[Text here... <span class="vSpecial"><p>Special Offer - more text here!</p></span>]]>

I was wondering if there'd be a Regular Expression to identify < and > wrapped in CDATA content, so I could easily use the Replace All to change only them. 我想知道是否存在正则表达式来标识包装在CDATA内容中的<> ,因此我可以轻松地使用全部替换仅更改它们。

UPDATE UPDATE

The content of CDATA can contain line breaks. CDATA的内容可以包含换行符。

Code

See regex in use here 查看正则表达式在这里使用

<!\[CDATA\[)(?:(?!\]\]>).)*?\K(?:(<)|(>))

Replacement: (?{1}&lt;)(?{2}&gt;) 替换: (?{1}&lt;)(?{2}&gt;)

Note : For display purposes the link above uses \\G(?!\\A) . 注意 :出于显示目的,上面的链接使用\\G(?!\\A) This is not supported in Notepad++, thus it's been dropped in the actual answer. Notepad ++不支持此功能,因此实际答案中已将其删除。 I added it to the link to show what it basically does. 我将其添加到链接中以显示其基本功能。

See the Notepadd++ documentation for more information. 有关更多信息,请参见Notepadd ++文档 It mentions the following: 它提到以下内容:

For those readers familiar with Perl, \\G is not supported. 对于熟悉Perl的读者,不支持\\G


Results 结果

Before 之前

之前

After

后


Explanation 说明

Click Replace All repeatedly until the message at the bottom shows Replace All: 0 occurrences were replaced. 重复单击Replace All直到底部的消息显示Replace All: 0 occurrences were replaced. It will replace the first occurrence, then the second occurrence, then third, etc. for each CDATA that is found until there are no more matches. 对于找到的每个CDATA ,它将替换第一个匹配项,然后替换第二个匹配项,然后替换第三个匹配项,直到没有更多匹配项为止。

Pattern 图案

  • <!\\[CDATA\\[ Matches <![[CDATA[ literally <!\\[CDATA\\[匹配<![[CDATA[字面上
  • (?:(?!\\]\\]>).)*? Tempered lazy token matching any character any number of times, but as few as possible ensuring what follows doesn't match ]]> 脾气懒惰的令牌可多次匹配任何字符,但应尽可能少地确保后面的内容不匹配]]>
  • \\K Resets the starting point of the reported match. \\K重置报告的比赛的起点。 Any previously consumed characters are no longer included in the final match 最终比赛中将不再包含任何以前消耗的字符
  • (?:(<)|(>)) Match either of the following (?:(<)|(>))匹配以下任一
    • (<) Capture < literally into capture group 1 (<)捕获<从字面上捕获到捕获组1中
    • (>) Capture > literally into capture group 2 (>)捕获>逐字地进入捕获组2

Replacement 替代

Notepad++ allows conditional replacements, so (?{1}&lt;) makes reference to capture group one and (?{2}&gt;) makes reference to capture group 2. Notepad ++允许条件替换,因此(?{1}&lt;)引用捕获组1,而(?{2}&gt;)引用捕获组2。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM