简体   繁体   English

如何使用Webclient / HttpClient以编程方式获取数据?

[英]How to get the data programatically using Webclient / HttpClient?

在此处输入图片说明

I wanted to download the data from https://eauction.ccmc.gov.in/frm_scduled_items.aspx using the date listed in the dropdown. 我想使用下拉列表中列出的日期从https://eauction.ccmc.gov.in/frm_scduled_items.aspx下载数据。

    private async Task Cbetest()
    {
        using (var client = new HttpClient())
        {
            client.BaseAddress = new Uri("https://eauction.ccmc.gov.in");
            var content = new FormUrlEncodedContent(new[]
        {
            new KeyValuePair<string, string>("ctl00$ContentPlaceHolder1$gridedit$ctl14$ctl02","17/02/2016")
        });
            var result = await client.PostAsync("/frm_scduled_items.aspx", content);
            string resultContent = await result.Content.ReadAsStringAsync();
            Console.WriteLine(resultContent);
        }
    }

I wanted to download the data shown in the above image 我想下载上图中显示的数据

You need to do a little extra work to simulate a post to begin scraping against a ASP.NET WebForms application. 您需要做一些额外的工作来模拟帖子,以开始针对ASP.NET WebForms应用程序进行抓取。 Mostly, you're going to need to pass along valid ViewState and EventValidation parameters, which you can retrieve from an initial GET request. 通常,您将需要传递有效的ViewState和EventValidation参数,您可以从初始GET请求中检索这些参数。

I'm using the HTML Agility Pack to ease with parsing the initial response, I recommend you look into it if you're planning to scrape HTML. 我正在使用HTML Agility Pack来简化初始响应的解析,如果您打算抓取HTML,建议您仔细阅读一下。

The following seems to get the results you're looking for, though I haven't looked too deeply in the response HTML. 以下内容似乎可以得到您想要的结果,尽管我对响应HTML的了解还不是很深。

using(var client = new HttpClient())
{
    client.BaseAddress = new Uri("https://eauction.ccmc.gov.in");

    var initial = await client.GetAsync("/frm_scduled_items.aspx");

    var initialContent = await initial.Content.ReadAsStringAsync();

    var htmlDoc = new HtmlDocument();

    htmlDoc.LoadHtml(initialContent);

    var viewState = htmlDoc.DocumentNode.SelectSingleNode("//input[@id='__VIEWSTATE']").GetAttributeValue("value", string.Empty);
    var eventValidation = htmlDoc.DocumentNode.SelectSingleNode("//input[@id='__EVENTVALIDATION']").GetAttributeValue("value", string.Empty);

    var content = new FormUrlEncodedContent(new Dictionary<string, string>{
        {"__VIEWSTATE", viewState},
        {"__EVENTVALIDATION", eventValidation},
        {"ctl00$ContentPlaceHolder1$drp_auction_date", "17/02/2016"}
    });

    var res = await client.PostAsync("/frm_scduled_items.aspx", content);

    var resContent = await res.Content.ReadAsStringAsync();

    Console.WriteLine(resContent);
}

From there you'll want to parse the resulting table to get useful information. 从那里,您将需要解析结果表以获得有用的信息。 If you want to crawl through the DataGrid's pages, you're going to need to get updated EventValidation and ViewState values and simulate additional posts for each page. 如果要在DataGrid的页面中进行爬网,则需要获取更新的EventValidation和ViewState值,并模拟每个页面的其他帖子。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM