简体   繁体   English

htmlagilitypack c#错误403禁止

[英]htmlagilitypack c# error 403 forbidden

I use htmlagilitypack to get information from here . 我使用htmlagilitypack从这里获取信息。 Here's the code 这是代码

int i=2449520;

..................... .....................

web.OverrideEncoding = Encoding.UTF8;
web.UserAgent = "Mozilla/5.0 (Windows NT 6.2; WOW64; rv:19.0) Gecko/20100101 Firefox/19.0";
doc = web.Load("http://ru-patent.info/24/49/" + i + ".html");
var List = doc.DocumentNode.SelectNodes("//div[@style='padding:10px; border:#999 dotted 1px; background-color:#FFF; background-image:url(/imgs/back.gif);']");
foreach (var t in List)
{
    Regex regex = new Regex(@"\sRU\s\d+");
    Match match = regex.Match(t.InnerText);
    sw.WriteLine(i.ToString());
    while (match.Success)
    {
       sw.WriteLine(match.ToString());
       match = match.NextMatch();
    }
    sw.WriteLine('\n');
}
i++;

I also use a timer with interval of 10 seconds and there are more than thousand of pages that I need to get information from. 我还使用间隔为10秒的计时器,并且需要从中获取信息的页面超过一千。 But after about 30 pages I get the 403 forbidden error. 但是大约30页后,我收到403禁止错误。 How can I bypass this? 我该如何绕过呢?

Response 403 means that server refuses to accept your request. 响应403表示服务器拒绝接受您的请求。 I guess this can be a server protection from DDoS. 我想这可能是DDoS的服务器保护。 You can use different servers (with different API address) or try to take break between requests. 您可以使用不同的服务器(具有不同的API地址),也可以尝试在请求之间进行中断。 Also it is always good to ask site owners what is the best way to parse their information. 询问网站所有者总是最好的方法是解析他们的信息的最佳方式。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM