[英]Grab the contents of a Drupal website that is secured with a login form
I would like to grab some content from a website that is made with Drupal. 我想从Drupal制作的网站上获取一些内容。 The challenge here is that i need to login on this site before i can access the page i want to scrape.
这里的挑战是,我需要先登录此网站才能访问要抓取的页面。 Is there a way to automate this login process in my C# code, so i can grab the secure content?
有没有一种方法可以在我的C#代码中自动执行此登录过程,以便获取安全内容?
To access the secured content, you'll need to store and send cookies with every request to your server, starting with the request that sends your log in info and then saving the session cookie that the server gives you (which is your proof that you are who you say you are). 要访问受保护的内容,您需要将cookie与每个请求一起存储并发送到服务器,从发送登录信息的请求开始,然后保存服务器为您提供的会话cookie(这证明您已你说的是谁)。
You can use the System.Windows.Forms.WebBrowser
for a less control but out-of-the-box solution that will handle cookies. 您可以使用
System.Windows.Forms.WebBrowser
获得较少的控制权,但可以使用现成的解决方案来处理Cookie。
My preferred method is to use System.Net.HttpWebRequest
to send and receive all web data and then use the HtmlAgilityPack to parse the returned data into a Document Object Model (DOM) which can be easily read from. 我的首选方法是使用
System.Net.HttpWebRequest
发送和接收所有Web数据,然后使用HtmlAgilityPack将返回的数据解析为文档对象模型 (DOM),该文档对象模型可以轻松读取。
The trick to getting System.Net.HttpWebRequest
to work is that you must create a long-lived System.Net.CookieContainer
that will keep track of your log in info (and other things the server expects you to keep track of). 使
System.Net.HttpWebRequest
正常工作的技巧是,您必须创建一个长期存在的System.Net.CookieContainer
,它将跟踪您的登录信息(以及服务器希望您跟踪的其他内容)。 The good news is that the HttpWebRequest
will take care of all of this for you if you provide the container. 好消息是,如果您提供容器,则
HttpWebRequest
将为您解决所有这些问题。
You need a new HttpWebRequest
for each call you make, so you must sets their .CookieContainer
to the same object every time. 每个调用都需要一个新的
HttpWebRequest
,因此每次都必须将其.CookieContainer
设置为相同的对象。 Here is an example: 这是一个例子:
using System.Net;
public void TestConnect()
{
CookieContainer cookieJar = new CookieContainer();
HttpWebRequest request = (HttpWebRequest)WebRequest.Create("http://www.mysite.com/login.htm");
request.CookieContainer = cookieJar;
HttpWebResponse response = (HttpWebResponse) request.GetResponse();
// do page parsing and request setting here
request = (HttpWebRequest)WebRequest.Create("http://www.mysite.com/submit_login.htm");
// add specific page parameters here
request.CookeContainer = cookieJar;
response = (HttpWebResponse) request.GetResponse();
request = (HttpWebRequest)WebRequest.Create("http://www.mysite.com/secured_page.htm");
request.CookeContainer = cookieJar;
// this will now work since you have saved your authentication cookies in 'cookieJar'
response = (HttpWebResponse) request.GetResponse();
}
http://msdn.microsoft.com/en-us/library/system.windows.forms.webbrowser.aspx http://msdn.microsoft.com/zh-CN/library/system.windows.forms.webbrowser.aspx
HttpWebRequest Class HttpWebRequest类别
http://msdn.microsoft.com/en-us/library/system.net.httpwebrequest.cookiecontainer.aspx http://msdn.microsoft.com/zh-CN/library/system.net.httpwebrequest.cookiecontainer.aspx
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.