简体   繁体   中英

Match any text inside any tags without these tags

I am trying to implement a simple 'find on page' engine using GeckoFX webbrowser control (as I am not satisfied with 'window.find()' and cannot make anything else working.

The idea is to add " <span style=\\"background-color: gold;\\"> searched text </span> " formatting into innerhtml element of a cell or paragraph containing the searched string.

While I look for matches in cell.InnerText, if I find the match, I want to replace cell.InnerHtml. And in case the cell.InnerHtml contains the searched string inside tags, these will get screwed.

Perhaps the code will explain better: here's my input string

<span><a href=\"/some random link containing text\">test search text that should be found</a></span>

code:

string goldSpanStyle = "<span style=\"background-color: gold;\">";
string textToFind = "text";
if (cell.TextContent.IndexOf(textToFind , comp) >= 0)
{
    match = cell.TextContent.Substring(cell.TextContent.IndexOf(textToFind , stringComparisonOrdinalIgnoreCase), textToFind.Length);
}

if (match != "")
{
    cell.InnerHtml = Regex.Replace(cellHtml, textToFind, goldSpanStyle + match + "</span>", RegexOptions.IgnoreCase);
}

Now in this case, we would screw the html, because span formatting would be added into the href attribute as well <span><a href=\\"/some random link containing <span style=\\"background-color: gold;\\">text</span>\\">test search <span style=\\"background-color: gold;\\">text</span> that should be found</a></span>

I need a regex that would only match text that is not inside tags... I tried this (?!(<[^>]+>))(text)(?=<\\/[^>]+>) but the results were not good, as it would only match in case the last letter of the search string would be right before the closing tag ('d' in this case (?!(<[^>]+>))test search text that should be found(?=<\\/[^>]+>)

Thanks in advance for help and advice Bartosz

=== Edit:

Basically, I think that in a sample string like <a href="www.match.com">match</a> I need to match only the second "match" word, not the one inside <a href="www.match.com"> ...

The below regex would capture only the second test or match ,

(test|match)(?=[^<>]*<)

DEMO

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM