正则表达式删除XML以外的所有内容

Question

I need help with a Regex for notepad++ to match all but XML 我需要有关Notepad ++的正则表达式的帮助，以匹配除XML以外的所有内容

The regex I'm using: (!?\\<.*\\>) <-- I want the opposite of this (in first three lines) 我正在使用的正则表达式： (!?\\<.*\\>) <-我想要与此相反（在前三行中）

The example code: 示例代码：

[20173003] This text is what I want to delete [<Person><Name>Foo</Name><Surname>Bar</Surname></Person>], and this text too.
[20173003] This is another text to delete [<Person><Name>Bar</Name><Surname>Foo</Surname></Person>]
[20173003] This text too... [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], delete me!
[20173003] But things like this make the regex to fail < [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], or this>

Expected result: 预期结果：

<Person><Name>Foo</Name><Surname>Bar</Surname></Person>
<Person><Name>Bar</Name><Surname>Foo</Surname></Person>
<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>
<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>

Thanks in advance! 提前致谢！

Answer 1

This is not perfect, but should work with your input that looks quite simple and well-structured. 这并不完美，但是应该与看起来非常简单且结构合理的输入一起使用。

If you need to handle just a single unnested <Person> tag , you may use simple (<Person>.*?</Person>)|. 如果只需要处理一个未嵌套的<Person>标记 ，则可以使用简单的(<Person>.*?</Person>)|. regex (that will match and capture into Group 1 any <Person> tag and will match any other char) and replace with a conditional replacement pattern (?{1}$1\\n:) (that will reinsert Person tag with a newline after it or will replace the match with an empty string): regex（将匹配任何<Person>标记并将其捕获到组1中并将匹配任何其他字符）并替换为条件替换模式(?{1}$1\\n:) （它将在其后用换行符重新插入Person标记）或将匹配项替换为空字符串）：

To make it a bit more generic , you may capture the opening and corresponding closing XML tags with a recursion-based Boost regex, and the appropriate conditional replacement pattern: 为了使它更加通用 ，您可以使用基于递归的Boost regex和适当的条件替换模式来捕获打开和关闭的XML标签：

Find What : (<(\\w+)[^>]*>(?:(?!</?\\2\\b).|(?1))*</\\2>)|. 查找内容 ： (<(\\w+)[^>]*>(?:(?!</?\\2\\b).|(?1))*</\\2>)|.
Replace With : (?{1}$1\\n:) 替换为 ： (?{1}$1\\n:)
. matches newline : ON 匹配换行符 ： ON

Regex Details : 正则表达式详细信息 ：

(<(\\w+)[^>]*>(?:(?!</?\\2\\b).|(?1))*</\\2>) - Capturing group 1 (that will be later recursed with the (?1) subrouting call) matching (<(\\w+)[^>]*>(?:(?!</?\\2\\b).|(?1))*</\\2>) -捕获组1（稍后会递归与(?1)子路由调用）匹配
- <(\\w+)[^>]*> - any opening tag with its name captured into Group 2 <(\\w+)[^>]*> -名称在组2中的任何开头标签
- (?:(?!</?\\2\\b).|(?1))* - zero or more occurrences of: (?:(?!</?\\2\\b).|(?1))* -零次或多次出现：
  - (?!</?\\2\\b). - any char ( . ) not starting a sequence of </ + tag name as a whole word with an optional / in front -任何不以整个单词开头</ +标签名称的序列的字符（ . ），其前面带有可选的/
  - | - or - 要么
  - (?1) - the whole Group 1 subpattern is recursed (repeated) (?1) -重复执行第1组整个子模式
- </\\2> - the corresponding closing tag </\\2> -相应的结束标记
| - or - 要么
. - any single char. -任何单个字符。

Replacement pattern : 更换方式 ：

(?{1} - if Group 1 matches: (?{1} -如果第1组匹配：
- $1\\n - replace with its contents + a newline $1\\n替换为内容+换行符
- : - else replace with an empty string : -否则替换为空字符串
) - end of the replacement pattern. ) -替换模式结束。

正则表达式删除XML以外的所有内容

问题描述

1 个解决方案

解决方案1
2 已采纳 2017-03-30 11:10:03

正则表达式删除XML以外的所有内容

问题描述

1 个解决方案

解决方案1 2 已采纳 2017-03-30 11:10:03

解决方案1
2 已采纳 2017-03-30 11:10:03