[英]Regex to remove all except XML
I need help with a Regex for notepad++ to match all but XML 我需要有关Notepad ++的正则表达式的帮助,以匹配除XML以外的所有内容
The regex I'm using: (!?\\<.*\\>)
<-- I want the opposite of this (in first three lines) 我正在使用的正则表达式:
(!?\\<.*\\>)
<-我想要与此相反(在前三行中)
The example code: 示例代码:
[20173003] This text is what I want to delete [<Person><Name>Foo</Name><Surname>Bar</Surname></Person>], and this text too.
[20173003] This is another text to delete [<Person><Name>Bar</Name><Surname>Foo</Surname></Person>]
[20173003] This text too... [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], delete me!
[20173003] But things like this make the regex to fail < [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], or this>
Expected result: 预期结果:
<Person><Name>Foo</Name><Surname>Bar</Surname></Person>
<Person><Name>Bar</Name><Surname>Foo</Surname></Person>
<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>
<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>
Thanks in advance! 提前致谢!
This is not perfect, but should work with your input that looks quite simple and well-structured. 这并不完美,但是应该与看起来非常简单且结构合理的输入一起使用。
If you need to handle just a single unnested <Person>
tag , you may use simple (<Person>.*?</Person>)|.
如果只需要处理一个未嵌套的
<Person>
标记 ,则可以使用简单的(<Person>.*?</Person>)|.
regex (that will match and capture into Group 1 any <Person>
tag and will match any other char) and replace with a conditional replacement pattern (?{1}$1\\n:)
(that will reinsert Person
tag with a newline after it or will replace the match with an empty string): regex(将匹配任何
<Person>
标记并将其捕获到组1中并将匹配任何其他字符)并替换为条件替换模式(?{1}$1\\n:)
(它将在其后用换行符重新插入Person
标记)或将匹配项替换为空字符串):
To make it a bit more generic , you may capture the opening and corresponding closing XML tags with a recursion-based Boost regex, and the appropriate conditional replacement pattern: 为了使它更加通用 ,您可以使用基于递归的Boost regex和适当的条件替换模式来捕获打开和关闭的XML标签:
Find What : (<(\\w+)[^>]*>(?:(?!</?\\2\\b).|(?1))*</\\2>)|.
查找内容 :
(<(\\w+)[^>]*>(?:(?!</?\\2\\b).|(?1))*</\\2>)|.
Replace With : (?{1}$1\\n:)
替换为 :
(?{1}$1\\n:)
.
matches newline : ON
匹配换行符 :
ON
Regex Details : 正则表达式详细信息 :
(<(\\w+)[^>]*>(?:(?!</?\\2\\b).|(?1))*</\\2>)
- Capturing group 1 (that will be later recursed with the (?1)
subrouting call) matching (<(\\w+)[^>]*>(?:(?!</?\\2\\b).|(?1))*</\\2>)
-捕获组1(稍后会递归与(?1)
子路由调用)匹配
<(\\w+)[^>]*>
- any opening tag with its name captured into Group 2 <(\\w+)[^>]*>
-名称在组2中的任何开头标签 (?:(?!</?\\2\\b).|(?1))*
- zero or more occurrences of: (?:(?!</?\\2\\b).|(?1))*
-零次或多次出现:
(?!</?\\2\\b).
- any char ( .
) not starting a sequence of </
+ tag name as a whole word with an optional /
in front </
+标签名称的序列的字符( .
),其前面带有可选的/
|
- or (?1)
- the whole Group 1 subpattern is recursed (repeated) (?1)
-重复执行第1组整个子模式 </\\2>
- the corresponding closing tag </\\2>
-相应的结束标记 |
- or .
- any single char. Replacement pattern : 更换方式 :
(?{1}
- if Group 1 matches: (?{1}
-如果第1组匹配:
$1\\n
- replace with its contents + a newline $1\\n
替换为内容+换行符 :
- else replace with an empty string :
-否则替换为空字符串 )
- end of the replacement pattern. )
-替换模式结束。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.