简体   繁体   中英

How to make sure searched text in a C# WebBrowser control is actual text and not an element or attributes?

I am going to leave this here in case anyone can still answer this, but I am going to go a different route for my search

I know there are several questions on here that are similar but none get me where I am going.

I have the search part basically finished. It works beautifully. Gets all occurrences of the searched word or phrase ignoring case. But the problem is, if you were to search for "div" or "table" or some other word that is an html element name or attribute value, the search tries to highlight that too and totally screws up the page.

So I really just need a simple way to make sure the search ignores those occurrences. Here is what I have. I assume I probably need a really good regex but I can't write a regex to save my life, so help would be appreciated.

private void PerformSearch()
{
  string searchString = SearchTextBox.Text;
  HtmlDocument doc = ManualViewBrowser.Document;
  StringBuilder html = new StringBuilder(doc.Body.InnerHtml);

  doc.Body.InnerHtml = Regex.Replace(html.ToString(), searchString, new MatchEvaluator(Highlight), RegexOptions.IgnoreCase);
}

private string Highlight(Match m)
{
  return "<em class=\"highlight\">" + m.Value + "</em>";
}

Just remove all html tags from that html string with this method:

private string RemoveHtmlTags(string html) {
  return Regex.Replace(html, "<.*?>", String.Empty);
}

edit:

you are right, so instead of search inside the html just loop trough all the nodes of the page and search for the word inside them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM