DISCLAIMER: I know that using regex on xml is risky and generally a bad idea, but I can only feed regex into my syntax highlighting engine, and I can't spend the ressources required to create a new system just for xml-based languages.
So I'm trying to use regex to get the values inside XML tags, as such:
<LoremIpsum>I NEED THIS PART</LoremIpsum>
I thought this would be nice and easy, and I could just use (>.*<\\/)
. It works perfectly on any online regex tester, however, as soon as I try using it in .NET, it completely messes up, and I end up getting a completely unpredictable output. What would be the correct way to do this, in one regex expression, considering I'm using .NETs System.Text.RegularExpressions
?
This is probably because .NET Regex are greedy. My suggestion would be to use non greedy .*?
or [^<]
instead of .
:
(>.*?<\/)
(>[^<]*<\/)
Like that it can't move over a <
.
You never define what it completely messed up
means, but try doing this:
(>.*?<\/)
The ?
in .*?
makes it a non-greedy match. By default, regular expressions operators greedy meaning they will match as much as possible. The non-greedy form matches as little as possible. To see the difference, match 'is test of' against both forms: With (>.*<\\/)
you will match: is <a>test</a> of
. With (>.*?<\\/)
you will match is <a>test
.
If you want to avoid any XML tags in the match, then you should use @ThomasWeller's solution.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.