简体   繁体   中英

How to identify html tags in html string

I have below html string, where i am trying to identify the <br> tag start and end of the whole text inside an html string using the below code

 var htmlstring = "<p><span><br> text <b>text &nbsp;<br></b>text <br></span></p>"
 var document = new HtmlDocument();
 document.LoadHtml(htmlString);
           
 var nodes= rootNode.SelectNodes("//br")

but it is giving all <br> tags nodes where i want only at the start and at the end of whole html text string in below html string

<p><span><br> text <b> text&nbsp;<br></b>text <br></span></p>

I am looking for nodes should be 2 instead of 3 but getting as 3 as it counts the <br> tag presented in between text as well.

Could any one please help on this how can i achieve this, many thanks in advance

You can use the Split method to solve your problem. I have a suggestion for you as follows. It prints text between <br> tags which are start and end tags. In addition, you can modify the output according to your requirements. Maybe it can be solved by using the regex pattern .

const string tag = "<br>";
var splitedHtmlString = htmlString.Split(tag);
StringBuilder builder = new StringBuilder();
for (int i = 1; i < splitedHtmlString.Length - 1; i++)
{
     builder.Append(splitedHtmlString[i]);
     builder.Append(tag);
}
builder.Remove(builder.ToString().Length - tag.Length, tag.Length);
Console.WriteLine(builder.ToString());

Output: text <b>text &nbsp;<br></b>text

You can convert your string to an HtmlDocument and filter by nodes, using HtmlAgilityPack library

HtmlDocument document = new HtmlDocument();

document.LoadHtml("your html code");

var htmlTag = document.DocumentNode.SelectNodes("//br");

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM