简体   繁体   中英

XPATH using contains and following-sibling

I know there are allot of posts about this but XPath seems to be my weak point when it comes to web scraping. The below does not seems to work even though i'm convinced its correct.

Basically i'm looking for a td that contains the text "Pivot Point 2nd Level Resistance" and taking the following sibling td value. What went wrong?

string RS2 = doc.DocumentNode.SelectSingleNode("//td[contains(text(),'Pivot Point 2nd Level Resistance')]/following-sibling::td[1]").InnerText

Below is what i'm scraping:

 <tr data-ng-repeat="point in cheatSheetData | filter:categoryFilter" data-ng-class="point.class" class="high support-resistance"> <td class="label support-resistance highlight" data-ng-class="{'highlight': point.labelSupportResistance}"> Pivot Point 2nd Level Resistance </td> <td class="value"> 9.43 </td> <td class="label pivot-points" data-ng-class="{'highlight': point.labelTurningPoints}"> </td> </tr> 

EDIT: Looks like the website data i'm attempting to scrape has changed to load after the fact so the Node is not available during my scrape. I tested this by going the route of setting up Phantom & Selenium for a headless browser and it works fine. This is not the route I want to take but the issue has been found.

Steps in a path expression are separated by / so you want //td[contains(text(),'Pivot Point 2nd Level Resistance')]/following-sibling::td[1] to make syntactically sense. I would ditch the text() and use //td[contains(. ,'Pivot Point 2nd Level Resistance')]/following-sibling::td[1] .

When I try to write a .NET 4.6.1 console program with the latest NuGet package of HTMLAgilityPack and the code

            string html = @"<html><body><table><tr data-ng-repeat=""point in cheatSheetData | filter:categoryFilter"" data-ng-class=""point.class"" class=""high support-resistance"">
                <td class=""label support-resistance highlight"" data-ng-class=""{'highlight': point.labelSupportResistance}"">
                    Pivot Point 2nd Level Resistance
                </td>
                <td class=""value"">
                    9.43
                </td>
                <td class=""label pivot-points"" data-ng-class=""{'highlight': point.labelTurningPoints}"">

                </td>
</tr></table></body></html>";

            HtmlDocument doc = new HtmlDocument();
            doc.LoadHtml(html);

            string RS2 = doc.DocumentNode.SelectSingleNode("//td[contains(text(),'Pivot Point 2nd Level Resistance')]/following-sibling::td[1]").InnerText;

            Console.WriteLine(RS2);

it outputs

                9.43

so based on that the XPath is fine.

You might need to edit your question to tell us which result you get and try to add minimal but complete snippets of input and code where your attempt fails.

After receiving confirmation that my XPath was correct, I went ahead and tested my code out with a headless browser (Phantom Driver & Selenium) and the XPath now returns back a value. Seems the website has changed and now the Node is not generated yet. Not the path I want to take but the issue is found.

Here is my full code if anyone is interested:

IWebDriver driver = new PhantomJSDriver();
driver.Navigate().GoToUrl(Url);

string RS2 = driver.FindElement(By.XPath("//td[contains(.,'Pivot Point 2nd Level Resistance')]/following-sibling::td[1]")).Text;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM