使用AngleSharp在C＃中使用验证码进行网页爬取

Question

i am Crawling govt Web Site with Recaptcha is it legal or illegal and i found some links in back-end code which is commented other than the below i mention links and these links are not used on web sites, with that link i am crawling the data is that link is good to crawl the data or if i used to crawl the data with that link the web site owners may block my ip address. 我使用Recaptcha搜寻govt网站是合法的还是非法的，我在后端代码中发现了一些链接，这些链接除了以下内容外，还被注释掉了。数据是该链接很好地对数据进行爬网，或者如果我以前使用该链接对数据进行爬网，则网站所有者可能会阻止我的IP地址。 this is my code what i am crawling the data 这是我的代码，我正在爬行数据

 var requester = new HttpRequester();
            requester.Headers["User-Agent"] = "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.2704.106 Safari/537.36";

            var configuration = Configuration.Default.WithDefaultLoader(requesters: new[] { requester }).WithCookies();
            string url = "http://www.mca.gov.in/mcafoportal/viewSignatoryDetails.do";
            var context = BrowsingContext.New(configuration);
            await context.OpenAsync(url);

            try
            {
                await context.Active.QuerySelector<IHtmlFormElement>("form[name='signatoryForm']").SubmitAsync(new
                {
                    companyID= "U30009KA2001PTC029692",
                    displayCaptcha ="false"
                });
                Console.WriteLine();
            }
            catch(Exception ex)
            {
                Console.WriteLine(ex.InnerException.Message);
            }


            if (context.Active != null)
            {
                var sdTable = context.Active.QuerySelector<IHtmlTableElement>("table[id='signatoryDetails']");
                if (sdTable != null)
                {
                    if (sdTable.Children.Count() > 0)
                    {
                        for (int i = 0; i < sdTable.Children[1].ChildElementCount; i++)
                        {
                            Console.WriteLine(sdTable.Children[1].Children[i].Children[0].TextContent);
                            Console.WriteLine(sdTable.Children[1].Children[i].Children[1].TextContent);
                            Console.WriteLine(sdTable.Children[1].Children[i].Children[2].TextContent);
                            Console.WriteLine(sdTable.Children[1].Children[i].Children[3].TextContent);
                            Console.WriteLine(sdTable.Children[1].Children[i].Children[4].TextContent);
                            Console.WriteLine(sdTable.Children[1].Children[i].Children[5].TextContent);
                            Console.WriteLine(sdTable.Children[1].Children[i].Children[6].TextContent);
                            Console.WriteLine(sdTable.Children[1].Children[i].Children[7].TextContent);
                            Console.WriteLine("------------------------------");
                        }
                    }
                }
                else
                {
                    Console.WriteLine("No result found");
                }
            }
        }
        catch ( Exception ex)
        {
            Console.WriteLine(ex.Message);
        }

i am crawling the data with this url Index Charges but when i change the this url Signatory i am crawl the data some error or not working as first url, please help me what i am missing in that. 我正在使用此url 索引费用检索数据，但是当我更改此url 签名者时，我正在检索数据时出现一些错误或无法作为第一个url工作，请帮助我在其中缺少的内容。

Answer 1

I am not 100% sure I understand your question. 我不是100％肯定我了解您的问题。 Nevertheless, hopefully the following answer will help you a bit... 不过，希望以下答案对您有所帮助...

Recaptcha is usually requiring JavaScript (as far as I know there is a fallback variant, but I am not sure if its used on your sites). Recaptcha通常需要JavaScript（据我所知，这是一个后备变体，但是我不确定它是否在您的网站上使用过）。 Therefore, even though your form may be valid in general you will never get a valid captcha token. 因此，即使您的表格通常可能有效，您也永远不会获得有效的验证码令牌。

There is AngleSharp.Scripting.JavaScript for enabling JavaScript, but keep in mind that is only experimental and does only work for simple scripts. 有用于启用JavaScript的AngleSharp.Scripting.JavaScript，但请记住，这只是实验性的，仅对简单脚本有效。 The scripts in question may be too much for it. 有问题的脚本可能太多了。

使用AngleSharp在C＃中使用验证码进行网页爬取

问题描述

1 个解决方案

解决方案1
0 2018-12-09 23:22:30

使用AngleSharp在C＃中使用验证码进行网页爬取

问题描述

1 个解决方案

解决方案1 0 2018-12-09 23:22:30

解决方案1
0 2018-12-09 23:22:30