I need help with a Regex for notepad++ to match all but XML
The regex I'm using: (!?\\<.*\\>)
<-- I want the opposite of this (in first three lines)
The example code:
[20173003] This text is what I want to delete [<Person><Name>Foo</Name><Surname>Bar</Surname></Person>], and this text too.
[20173003] This is another text to delete [<Person><Name>Bar</Name><Surname>Foo</Surname></Person>]
[20173003] This text too... [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], delete me!
[20173003] But things like this make the regex to fail < [<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>], or this>
Expected result:
<Person><Name>Foo</Name><Surname>Bar</Surname></Person>
<Person><Name>Bar</Name><Surname>Foo</Surname></Person>
<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>
<Person><Name>Lorem</Name><Surname>Ipsum</Surname></Person>
Thanks in advance!
This is not perfect, but should work with your input that looks quite simple and well-structured.
If you need to handle just a single unnested <Person>
tag , you may use simple (<Person>.*?</Person>)|.
regex (that will match and capture into Group 1 any <Person>
tag and will match any other char) and replace with a conditional replacement pattern (?{1}$1\\n:)
(that will reinsert Person
tag with a newline after it or will replace the match with an empty string):
To make it a bit more generic , you may capture the opening and corresponding closing XML tags with a recursion-based Boost regex, and the appropriate conditional replacement pattern:
Find What : (<(\\w+)[^>]*>(?:(?!</?\\2\\b).|(?1))*</\\2>)|.
Replace With : (?{1}$1\\n:)
.
matches newline : ON
Regex Details :
(<(\\w+)[^>]*>(?:(?!</?\\2\\b).|(?1))*</\\2>)
- Capturing group 1 (that will be later recursed with the (?1)
subrouting call) matching
<(\\w+)[^>]*>
- any opening tag with its name captured into Group 2 (?:(?!</?\\2\\b).|(?1))*
- zero or more occurrences of:
(?!</?\\2\\b).
- any char ( .
) not starting a sequence of </
+ tag name as a whole word with an optional /
in front |
- or (?1)
- the whole Group 1 subpattern is recursed (repeated) </\\2>
- the corresponding closing tag |
- or .
- any single char. Replacement pattern :
(?{1}
- if Group 1 matches:
$1\\n
- replace with its contents + a newline :
- else replace with an empty string )
- end of the replacement pattern.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.