[英]Retrieve DOM data from site
Is there any chance to retrieve DOM results when I click older posts from the site: 当我单击网站上的旧帖子时,是否有机会检索DOM结果:
http://www.facebook.com/FamilyGuy http://www.facebook.com/FamilyGuy
using C# or Java? 使用C#或Java? I heard that it is possible to execute a script with
onclick
and get results. 我听说可以通过
onclick
执行脚本并获得结果。 How I can execute this script: 我如何执行此脚本:
onclick="(JSCC.get('j4eb9ad57ab8a19f468880561') && JSCC.get('j4eb9ad57ab8a19f468880561').getHandler())(); return false;"
I think older posts
link sends an Ajax
request and appends the response to the page. 我认为
older posts
链接发送Ajax
请求并将响应附加到页面。 (I'm not sure. You should check the page source). (我不确定。您应该检查页面源代码)。
You can emulate this behavior in C#
, Java
, and JavaScript
(you already have the code for javascript). 您可以在
C#
, Java
和JavaScript
模拟这种行为(您已经有了javascript的代码)。
Edit: 编辑:
It seems that Facebook
uses some sort of internal APIs ( JSCC
) to load the content and it's undocumented. 看来
Facebook
使用某种内部API( JSCC
)来加载内容,并且它是未记录的。
I don't know about Facebook
Developers' APIs (you may want to check that first) but if you want to emulate exactly what happens in your browser then you can use TamperData
to intercept GET
requests when you click on more posts
link and find the request URL and it's parameters . 我不知道
Facebook
Developers的API(您可能想先检查一下),但是如果您想准确地模拟浏览器中发生的情况,那么您可以使用TamperData
来拦截GET
请求,方法是单击more posts
链接并找到请求网址及其参数 。
After you get this information you have to Login
to your account in your application and get the authentication cookie. 获取此信息后,您必须
Login
到应用程序中的帐户并获取身份验证cookie。
C#
sample code as you requested: 您要求的
C#
示例代码:
private CookieContainer GetCookieContainer(string loginURL, string userName, string password)
{
var webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
var responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
string responseData = responseReader.ReadToEnd();
responseReader.Close();
// Now you may need to extract some values from the login form and build the POST data with your username and password.
// I don't know what exactly you need to POST but again a TamperData observation will help you to find out.
string postData =String.Format("UserName={0}&Password={1}", userName, password); // I emphasize that this is just an example.
// cookie container
var cookies = new CookieContainer();
// post the login form
webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
webRequest.Method = "POST";
webRequest.ContentType = "application/x-www-form-urlencoded";
webRequest.CookieContainer = cookies;
// write the form values into the request message
var requestWriter = new StreamWriter(webRequest.GetRequestStream());
requestWriter.Write(postData);
requestWriter.Close();
webRequest.GetResponse().Close();
return cookies;
}
Then you can perform GET
requests with the cookie you have, on the URL
you've got from analyzing that JSCC.get().getHandler()
requests using TamperData
, and eventually you'll get what you want as a response stream: 然后,您可以使用
TamperData
通过分析JSCC.get().getHandler()
请求获得的URL
,对拥有的cookie进行GET
请求,最终得到所需的响应流:
var webRequest = WebRequest.Create(url) as HttpWebRequest;
webRequest.CookieContainer = GetCookieContainer(url, userName, password);
var responseStream = webRequest.GetResponse().GetResponseStream();
You can also use Selenium
for browser automation. 您也可以将
Selenium
用于浏览器自动化。 It also has C#
and Java
APIs (I have no experience using Selenium
). 它还具有
C#
和Java
API(我没有使用Selenium
经验)。
Facebook loads it's content dynamically with AJAX. Facebook使用AJAX动态加载其内容。 You can use a tool like Firebug to examine what kind of request is made, and then replicate it.
您可以使用Firebug之类的工具检查发出的请求类型,然后将其复制。
Or you can use a browser render engine like webkit to process the JavaScript for you and expose the resulting HTML: http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/ 或者,您可以使用浏览器渲染引擎(如webkit)来为您处理JavaScript并显示最终的HTML: http : //webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.