简体   繁体   English

c#HtmlAgilityPack,如何获取特定标签所有出现的InnerText?

[英]c# HtmlAgilityPack, How to grab InnerText of all occurences of specific tag?

As briefly explained in the title im trying to grab every InnerText of every tag occurence and add it to a List. 正如标题中简要解释的那样,im试图获取每个标记事件的每个InnerText并将其添加到列表中。 Here is my code aswell as my html: 这是我的代码以及我的html:

HTML-Body: HTML的身体:

<body cz-shortcut-listen="true">
{"draw":1,"recordsTotal":9437,"recordsFiltered":9437,"data":[["
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">AK-47 | Aquamarine Revenge (Factory New)&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;"href="\&quot;\/id\/115739257\&quot;">33.87&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">34.53&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;https:\/\/track.steamanalyst.com\/730\/115739257\/all\&quot;">25.9&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">164&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">-0.16&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115739257\&quot;">2.10945&lt;\/a&gt;"],["</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">AK-47 | Aquamarine Revenge (Minimal Wear)&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">23.44&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">21.85&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;https:\/\/track.steamanalyst.com\/730\/115734122\/all\&quot;">17.61&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">533&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">-2.65&lt;\/a&gt;","</a>
<a target="\&quot;_blank\&quot;" href="\&quot;\/id\/115734122\&quot;">0.94387&lt;\/a&gt;"],["</a>
</body>

My Code: 我的代码:

List<string> Data = new List<string>();
foreach (HtmlNode node in doc.DocumentNode.SelectNodes("//a[@target]"))
{
    if(j <= 6)
    {
        Data.Add(node.InnerText);
        if (j == 6)
        {
            JsonDB.Add(Data[0], Data[1]);
            Data.Clear();
            j = 0;
        }
        else
        {
            j++;
        }
    }
}

Problem with this code: node.InnerText shows a joined string of all InnerTexts of all tags in body! 此代码存在问题:node.InnerText显示正文中所有标签的所有InnerTexts的连接字符串! Basically it shows this as the first node in doc.DocumentNode.SelectNodes("//a[@target]") : 基本上,它将其显示为doc.DocumentNode.SelectNodes("//a[@target]")的第一个节点:

AK-47 | Aquamarine Revenge (Factory New)","33.8","34.34","25.89","170",
"-1.27","2.03181"],[...

All tags in body: 正文中的所有标签:

doc.DocumentNode.SelectNodes("//a[@target]"))

Tags in the doc: 文档中的标签:

doc.DocumentNode.SelectNodes(".//a[@target]"))

SOLUTION: It has to be treated like a JSON-Object before going into the HTML 解决方案:在进入HTML之前,必须像对待JSON对象一样对待它。

JObject jresponse = JObject.Parse(response);
foreach (JArray row in jresponse["data"])
{
    List<string> Data = new List<string>();
    foreach (JToken entry in row)
    {
        doc.LoadHtml(entry.ToString());
        HtmlNode node = doc.DocumentNode.SelectSingleNode("//a[@target]");
        Data.Add(node.InnerText);
    }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM