简体   繁体   中英

Matching last occurance of character using Regex

I need to match:

<p><span style="font-size: 18px;"><strong>Hello</strong></span></p>

I need to match the text hello between the last > and the first </

Using (?=>)(.*?)(?=</) returns <span style="font-size: 18px;"><strong>Hello

Thanks!

I know this is not the answer you were looking for but parsing html with regex is like eating soup with a fork. You'll get the job done eventually but it's very frustrating.

Try this instead and keep your sanity:

string html = "<p><span style=\"font-size: 18px;\"><strong>Hello</strong></span></p>";
System.Xml.Linq.XDocument doc = System.Xml.Linq.XDocument.Parse(html);
string hello = doc.Descendants().LastOrDefault().Value;

You could go with

/>([^<>]+)</

That should give you the desired match.

Do you only need to match this specific string? If yes, then you could simply use:

/<strong>([^<]*)</strong>/

which will match any text between the strong tags.

Try this

The constant variable for regex is

const string HTML_TAG_PATTERN = "<.*?>";

The function

 static string StripHTML(string inputString)
        {
            return Regex.Replace
              (inputString, HTML_TAG_PATTERN, string.Empty);
        }

and call the function like

string str = "<p><span style='font-size: 18px;'><strong>Hello</strong></span></p>";

str = StripHTML(str);

I think your first look ahead must look more like : (?<=>) ( look behind for > )

And replace .*? by [^<>]* (anything but < or > ).

If you need to keep your look around you can do : (?<=>)([^<>]*)(?=</)

If not, you can simply do : >([^<>]*)</

The difference is that using look around you won't capture < neither </ in the global match.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM