得到所有 <li> 一定內部的元素 <div> 與C＃

Question

我有一個包含幾個<div>元素的網頁。

我想編寫一個程序，在某個<h4>標頭之后打印<div>所有li元素。 誰能給我一些幫助或示例代碼？

<div id="content">
    <h4>Header</h4>
    <ul>
        <li><a href...></a> THIS IS WHAT I WANT TO GET</li>
    </ul>
</div>

Answer 1

在用C＃解析HTML時，請不要嘗試編寫自己的HTML。 HTML Agility Pack幾乎可以確定您想要做什么！

哪些部分是恆定的：

DIV中的“ id”？
h4

搜索完整的HTML文檔並僅對H4做出反應很可能是一團糟，而如果您知道DIV的ID為“內容”，那就去找吧！

var doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(yourHtml);

if ( doc.DocumentNode != null )
{
   var divs = doc.DocumentNode
                 .SelectNodes("//div")
                 .Where(e => e.Descendants().Any(e => e.Name == "h4"));

   // You now have all of the divs with an 'h4' inside of it.

   // The rest of the element structure, if constant needs to be examined to get
   // the rest of the content you're after.
}

Answer 2

如果是網頁，為什么需要進行HTML解析。 您用來構建網頁的技術是否可以訪問頁面的所有元素。 例如，如果您使用的是ASP.NET，則可以將ID分配給您的UL和LI（帶有runat服務器標簽），它們將在后面的代碼中可用。

您能解釋一下您打算做什么嗎？ 如果您嘗試提出網絡請求，請以字符串形式下載html，然后刪除HTML會很有意義

編輯認為這應該工作

HtmlDocument doc = new HtmlDocument();
doc.Load(myHtmlFile);

    foreach (HtmlNode p in doc.DocumentNode.SelectNodes("//div"))
    {
        if(p.Attributes["id"].Value == "content")
        {
            foreach(HtmlNode child in p.ChildNodes.SelectNodes("//ul"))
            {
                if(p.PreviousSibling.InnerText() == "Header")
                {
                    foreach(HtmlNode liNodes in p.ChildNodes)
                    {
                        //liNodes represent all childNode
                    }
                }
        }
    }

Answer 3

如果您想要的只是位於<div id="content">標記下面的所有<li></li>標記之間的東西，並且緊接在<h4>標記之后，那么這就足夠了：

//Load your document first.
//Load() accepts a Stream, a TextReader, or a string path to the file on your computer
//If the entire document is loaded into a string, then use .LoadHtml() instead.
HtmlDocument mainDoc = new HtmlDocument();
mainDoc.Load("c:\foobar.html");


//Select all the <li> nodes that are inside of an element with the id of "content"
// and come directly after an <h4> tag.
HtmlNodeCollection processMe = mainDoc.GetElementbyId("content")
                                      .SelectNodes("//h4/following-sibling::*[1]//li");

//Iterate through each <li> node and print the inner text to the console
foreach (HtmlNode listElement in processMe)
{
    Console.WriteLine(listElement.InnerText);
}

得到所有 <li> 一定內部的元素 <div> 與C＃

問題描述

3 個解決方案

解決方案1
1 已采納 2012-07-20 09:04:45

解決方案2
0 2012-07-20 09:11:36

解決方案3
0 2012-07-20 11:28:04

得到所有 <li> 一定內部的元素 <div> 與C＃

問題描述

3 個解決方案

解決方案1 1 已采納 2012-07-20 09:04:45

解決方案2 0 2012-07-20 09:11:36

解決方案3 0 2012-07-20 11:28:04

解決方案1
1 已采納 2012-07-20 09:04:45

解決方案2
0 2012-07-20 09:11:36

解決方案3
0 2012-07-20 11:28:04