How can I get an unique list of all tags from a html string. But I am able to extract the tags one by one only.
Code
public static void HtmlParser()
{
string html = @"<TD >
<DIV align=right>Name :<B> </B></DIV></TD>
<TD width=""50%"">
<INPUT class=box value=John maxLength=16 size=16 name=user_name>
</TD>
<TR vAlign=center> <code> This is a <kwd>vba</kwd> code piece</code> Hi I am sujoy";
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml(html);
string code = htmlDoc.DocumentNode.
.SelectSingleNode("//code").InnerHtml;
string TD = htmlDoc.DocumentNode
.SelectSingleNode("//TD").InnerText;
}
For the above code I want the output to be a list
of {"DIV","TD","TR","CODE"}
Not sure exactly what you mean by "an unique list of all tags from a html string".
If you want every element in the HTML document, use:
htmlDoc.DocumentNode.Descendants();
If you want a list of all <code>
tags, one way to to achieve that is using LINQ:
htmlDoc.DocumentNode.Descendants().Where(d => d.Name == "code");
Edit:
A list of all unique tags can be retrieved this way, for example:
htmlDoc.DocumentNode.Descendants().Where(d => !d.Name.StartsWith("#")).Select(d => d.Name).GroupBy(d => d).Select(g => g.Key)
This uses LINQ to go through the following steps:
Use htmlDoc.DocumentNode.Descendants()
and for unique list use HashSet
:
public static void HtmlParser()
{
HtmlDocument htmlDoc = new HtmlDocument();
htmlDoc.LoadHtml("Your html string containing tags like <div></div>...");
HashSet<string> hs = new HashSet<string>();
foreach(var dec in htmlDoc.DocumentNode.Descendants())
{
hs.Add (dec.Name);
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.