简体   繁体   English

C#HtmlAgilityPack Xpath问题,找不到H4内部文本

[英]C# HtmlAgilityPack Xpath problems, trouble finding H4 innertext

I have a method that will find everything I am looking for in a section of a webpage, except I am getting stuck trying to find an H4 within nodes. 我有一种方法可以在网页的某个部分中找到所需的所有内容,但我在尝试在节点内查找H4时遇到了麻烦。 The xpath for //div[@class='job '] correctly finds all 8 occurances that I am looking for. // div [@ class ='job']的xpath可以正确找到我正在寻找的所有8个事件。 But after I try and traverse the 8 occurances I hit problems. 但是,当我尝试遍历这8次事件时,我遇到了问题。

Here is the HTML output of the code I am looking inside. 这是我正在查看的代码的HTML输出。

<div class="job_art ">
<div style="background: #444      url('https://a.akamaihd.net/mwfb/mwfb/graphics/jobs/chicago/meet_with_the_south_gang_family_    760x225_01.jpg') 50% 0 no-repeat;">
</div>
</div>
<div class="job_details clearfix">
<h4>Meet With the South Gang Family</h4>
<div class="mastery_bar" title="Indicates how much of this Job you&#39;ve mastered.      Master Jobs to earn Skill Points."><div style="width: 0%" class="noHighlight"></div><p>100%     Mastered</p><div style="width: 0%"><p>100% Mastered</p></div></div><ul class="uses clearfix"     style="width:100px;"><li class="energy" base_value="2" current_value="2" title="Spend 2     Energy to do this Job once.">2</li></ul><ul class="pays clearfix" style="width:120px"     title="Earn XP, City Cash and Loot items while doing Jobs."><li class="experience" base_value="2" current_value="2">2</li><li class="cash_icon_jobs_8" base_value="2" current_value="2">2</li></ul><a id='btn_dojob_1' class='sexy_button_new sexy_energy_new medium orange impulse_buy' selector='#inner_page' requirements='{"energy":2}' precall='BrazilJobs.preDoJob' callback='BrazilJobs.doJob' href='remote/h.php?job=1&tab=1&clkdiv=btn_dojob_1'><span><span>Do Job</span></span></a></div><div class="job_additional_results"><div id="loot-bandit-1" class="lootContainer"></div><div class="previous_loot"></div></div><div id="bandit-contextual-1" class="contextual bandit-contextual"></div>

It always finds something else like "Clams(Bank)", which I have no idea how. 它总是会找到其他类似“ Clams(Bank)”的东西,我不知道该怎么做。 The problem starts with 问题开始于

  string MissionName = node.SelectSingleNode("//h4").InnerText;

I have tried numerous xpath, like //div[h4[1]], h4[1]. 我尝试了很多xpath,例如// div [h4 [1]],h4 [1]。 I only need the first occurence since it only occurs once. 我只需要第一次发生,因为它只发生一次。 Where does the problem start in my code? 问题从我的代码何处开始?

I need the inner text "Meet With the South Gang Family" 我需要内部文本“与南方帮派家庭聚会”

public static List<string> GetMissions()
    {
        List<string> FoundMissions = new List<string>();

        HTML_CONTENT = HTML_CONTENT.Replace("\r", "");
        HTML_CONTENT = HTML_CONTENT.Replace("\t", "");
        HTML_CONTENT = HTML_CONTENT.Replace("\n", "");
        HTML_CONTENT = HTML_CONTENT.Replace("\\", "");

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
        doc.Load(new StringReader(HTML_CONTENT));

        if(doc.DocumentNode == null)
            return FoundMissions;
        var DivNodes = doc.DocumentNode.SelectNodes("//div[@class='job ']");
        if (DivNodes != null)
        {
            string Count = DivNodes.Count.ToString();

Like I said, it finds all 8 occurances fine. 就像我说的,它发现所有8个事件都很好。 I debugged and got the above HTML i put at the top of this, so I think this part is fine. 我调试并获得了上面放在上面的HTML,所以我认为这部分很好。

            foreach (HtmlNode node in DivNodes)
            {

                string MissionName = node.SelectSingleNode("//h4").InnerText;
            }
        }

        return FoundMissions;
        }


    }

You need to explicitly tell that the XPath query is relative to current node by adding single dot ( . ) at the beginning : 您需要通过在开头添加单点( . )来明确表明XPath查询是相对于当前node的:

string MissionName = node.SelectSingleNode(".//h4").InnerText;

otherwise, the XPath will search from root node. 否则,XPath将从根节点搜索。 That's likely what cause you got incorrect result with your attempt. 这可能是导致您尝试错误结果的原因。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM