How to identify html tags in html string

Question

I have below html string, where i am trying to identify the   tag start and end of the whole text inside an html string using the below code

 var htmlstring = "<p><span><br> text <b>text &nbsp;<br></b>text <br></span></p>"
 var document = new HtmlDocument();
 document.LoadHtml(htmlString);
           
 var nodes= rootNode.SelectNodes("//br")

but it is giving all   tags nodes where i want only at the start and at the end of whole html text string in below html string

<p><span><br> text <b> text&nbsp;<br></b>text <br></span></p>

I am looking for nodes should be 2 instead of 3 but getting as 3 as it counts the   tag presented in between text as well.

Could any one please help on this how can i achieve this, many thanks in advance

Answer 1

You can use the Split method to solve your problem. I have a suggestion for you as follows. It prints text between   tags which are start and end tags. In addition, you can modify the output according to your requirements. Maybe it can be solved by using the regex pattern .

const string tag = "<br>";
var splitedHtmlString = htmlString.Split(tag);
StringBuilder builder = new StringBuilder();
for (int i = 1; i < splitedHtmlString.Length - 1; i++)
{
     builder.Append(splitedHtmlString[i]);
     builder.Append(tag);
}
builder.Remove(builder.ToString().Length - tag.Length, tag.Length);
Console.WriteLine(builder.ToString());

Output: text text   text

Answer 2

You can convert your string to an HtmlDocument and filter by nodes, using HtmlAgilityPack library

HtmlDocument document = new HtmlDocument();

document.LoadHtml("your html code");

var htmlTag = document.DocumentNode.SelectNodes("//br");

How to identify html tags in html string

Question

2 answers

solution1
0 2021-03-17 16:11:11

solution2
0 2021-03-17 18:09:14

How to identify html tags in html string

Question

2 answers

solution1 0 2021-03-17 16:11:11

solution2 0 2021-03-17 18:09:14

solution1
0 2021-03-17 16:11:11

solution2
0 2021-03-17 18:09:14