简体   繁体   中英

regex to highlight XML values

DISCLAIMER: I know that using regex on xml is risky and generally a bad idea, but I can only feed regex into my syntax highlighting engine, and I can't spend the ressources required to create a new system just for xml-based languages.


So I'm trying to use regex to get the values inside XML tags, as such:

<LoremIpsum>I NEED THIS PART</LoremIpsum>

I thought this would be nice and easy, and I could just use (>.*<\\/) . It works perfectly on any online regex tester, however, as soon as I try using it in .NET, it completely messes up, and I end up getting a completely unpredictable output. What would be the correct way to do this, in one regex expression, considering I'm using .NETs System.Text.RegularExpressions ?

This is probably because .NET Regex are greedy. My suggestion would be to use non greedy .*? or [^<] instead of . :

(>.*?<\/)
(>[^<]*<\/)

Like that it can't move over a < .

You never define what it completely messed up means, but try doing this:

(>.*?<\/)

The ? in .*? makes it a non-greedy match. By default, regular expressions operators greedy meaning they will match as much as possible. The non-greedy form matches as little as possible. To see the difference, match 'is test of' against both forms: With (>.*<\\/) you will match: is <a>test</a> of . With (>.*?<\\/) you will match is <a>test .

If you want to avoid any XML tags in the match, then you should use @ThomasWeller's solution.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM