简体   繁体   English

Selenium C# drive.PageSource - '太长,或指定路径的组件太长。

[英]Selenium C# drive.PageSource - 'is too long, or a component of the specified path is too long.'

I'm trying to pass the driver.PageSource from Selenium C# to HTML Agility Pack, but this line of code htmlDoc.Load(driver.PageSource);我试图将 driver.PageSource 从 Selenium C# 传递给 HTML Agility Pack,但是这行代码htmlDoc.Load(driver.PageSource); returns error: '...' is too long, or a component of the specified path is too long.返回错误: '...' 太长,或者指定路径的一个组件太长。

ps Selenium Python and Beautiful Soup doesn't produce this error, when I was trying to do the same thing in Python instead of C#. ps Selenium Python and Beautiful Soup 不会产生这个错误,当我试图用 Python 而不是 C# 做同样的事情时。

How to resolve this problem?如何解决这个问题?

Full Code:完整代码:

using System;
using System.Threading;
using HtmlAgilityPack;
using OpenQA.Selenium;
using OpenQA.Selenium.Chrome;
using OpenQA.Selenium.Support.UI;

namespace SeleniumSharp
{
    public static class WebScraping
    {
        public static void GetPageData()
        {
            // initial setup
            IWebDriver driver = new ChromeDriver();
            driver.Navigate().GoToUrl("<url>");

            // dropdown
            var dropdown1 = driver.FindElement(By.Id("cpMain_ucc1_ctl00_liResidentialFront"));
            dropdown1.Click();
            
            // enter search query
            var search = driver.FindElement(By.Id("cpMain_ucc1_ctl00_txtResidentialSearchBox"));
            search.Click();
            search.SendKeys("london");
            Thread.Sleep(3000);

            // submit search
            var submit = driver.FindElement(By.XPath("//div[@id='cpMain_ucc1_ctl00_pnlContentResidential']//a[@class='search-button']"));
            submit.Click();

            // Html Agility Pack
            HtmlDocument htmlDoc = new HtmlDocument();
            htmlDoc.Load(driver.PageSource);

            var address = htmlDoc.DocumentNode
                .SelectNodes("//div[@class='grid-address']")
                .ToList();

            foreach(var item in address)
            {
                Console.WriteLine(item.InnerText);
            }

        }

        
    }
}

This line of code returns error:这行代码返回错误:

htmlDoc.Load(driver.PageSource);

Error:错误:

'<html source>'is too long, or a component of the specified path is too long.
at System.IO.PathHelper.GetFullPathName(ReadOnlySpan`1 path, ValueStringBuilder& builder)
   at System.IO.PathHelper.Normalize(String path)
   at System.IO.Path.GetFullPath(String path)
   at System.IO.FileStream..ctor(String path, FileMode mode, FileAccess access, FileShare share, Int32 bufferSize, FileOptions options)
   at System.IO.StreamReader.ValidateArgsAndOpenPath(String path, Encoding encoding, Int32 bufferSize)  
   at System.IO.StreamReader..ctor(String path, Encoding encoding)
   at HtmlAgilityPack.HtmlDocument.Load(String path)

It is because you are using the method Load instead of LoadHtml .这是因为您使用的是Load而不是LoadHtml方法。 Load method consumes path to file that contains HTML, not HTML source (driver.PageSource). Load 方法使用包含 HTML 的文件路径,而不是 HTML 源代码 (driver.PageSource)。

// From File
var doc = new HtmlDocument();
doc.Load(filePath);

// From String
var doc = new HtmlDocument();
doc.LoadHtml(html);

So try to use所以尝试使用

htmlDoc.LoadHtml(driver.PageSource);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM