Library to extract data from html string

Question

Is there any free/open source c# libraries to extract data from html?

Given the input below

<div style="...">
 text part 1
</div>
<div style="...">
 text part 2
</div>

I want the output to be:

text part 1 text part 2

Answer 1

是的，您可以使用HtmlAgilityPack使用Xpath查询来解析HTML，就好像它是XML。

Answer 2

you can use HtmlAgilitiPack very good library.

and then:

public string StripHTMLTags(string str)
        {
            StringBuilder pureText = new StringBuilder();
            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(str);

            foreach (HtmlNode node in doc.DocumentNode.ChildNodes)
            {
                pureText.Append(node.InnerText);
            }

            return pureText.ToString();
        }

Library to extract data from html string

Question

2 answers

solution1
6 2011-12-17 23:07:01

solution2
4 ACCPTED 2011-12-17 23:15:12

Library to extract data from html string

Question

2 answers

solution1 6 2011-12-17 23:07:01

solution2 4 ACCPTED 2011-12-17 23:15:12

solution1
6 2011-12-17 23:07:01

solution2
4 ACCPTED 2011-12-17 23:15:12