简体   繁体   English

从站点检索DOM数据

[英]Retrieve DOM data from site

Is there any chance to retrieve DOM results when I click older posts from the site: 当我单击网站上的旧帖子时,是否有机会检索DOM结果:

http://www.facebook.com/FamilyGuy http://www.facebook.com/FamilyGuy

using C# or Java? 使用C#或Java? I heard that it is possible to execute a script with onclick and get results. 我听说可以通过onclick执行脚本并获得结果。 How I can execute this script: 我如何执行此脚本:

onclick="(JSCC.get('j4eb9ad57ab8a19f468880561') && JSCC.get('j4eb9ad57ab8a19f468880561').getHandler())(); return false;"

I think older posts link sends an Ajax request and appends the response to the page. 我认为older posts链接发送Ajax请求并将响应附加到页面。 (I'm not sure. You should check the page source). (我不确定。您应该检查页面源代码)。

You can emulate this behavior in C# , Java , and JavaScript (you already have the code for javascript). 您可以在C#JavaJavaScript模拟这种行为(您已经有了javascript的代码)。

Edit: 编辑:

It seems that Facebook uses some sort of internal APIs ( JSCC ) to load the content and it's undocumented. 看来Facebook使用某种内部API( JSCC )来加载内容,并且它是未记录的。

I don't know about Facebook Developers' APIs (you may want to check that first) but if you want to emulate exactly what happens in your browser then you can use TamperData to intercept GET requests when you click on more posts link and find the request URL and it's parameters . 我不知道Facebook Developers的API(您可能想先检查一下),但是如果您想准确地模拟浏览器中发生的情况,那么您可以使用TamperData来拦截GET请求,方法是单击more posts链接并找到请求网址及其参数

After you get this information you have to Login to your account in your application and get the authentication cookie. 获取此信息后,您必须Login到应用程序中的帐户并获取身份验证cookie。

C# sample code as you requested: 您要求的C#示例代码:

private CookieContainer GetCookieContainer(string loginURL, string userName, string password)
{
    var webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
    var responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
    string responseData = responseReader.ReadToEnd();
    responseReader.Close();

    // Now you may need to extract some values from the login form and build the POST data with your username and password.
    // I don't know what exactly you need to POST but again a TamperData observation will help you to find out.
    string postData =String.Format("UserName={0}&Password={1}", userName, password); // I emphasize that this is just an example.

    // cookie container
    var cookies = new CookieContainer();

    // post the login form
    webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
    webRequest.Method = "POST";
    webRequest.ContentType = "application/x-www-form-urlencoded";
    webRequest.CookieContainer = cookies;

    // write the form values into the request message
    var requestWriter = new StreamWriter(webRequest.GetRequestStream());
    requestWriter.Write(postData);
    requestWriter.Close();

    webRequest.GetResponse().Close();
    return cookies;
}

Then you can perform GET requests with the cookie you have, on the URL you've got from analyzing that JSCC.get().getHandler() requests using TamperData , and eventually you'll get what you want as a response stream: 然后,您可以使用TamperData通过分析JSCC.get().getHandler()请求获得的URL ,对拥有的cookie进行GET请求,最终得到所需的响应流:

var webRequest = WebRequest.Create(url) as HttpWebRequest;
webRequest.CookieContainer = GetCookieContainer(url, userName, password);
var responseStream = webRequest.GetResponse().GetResponseStream();

You can also use Selenium for browser automation. 您也可以将Selenium用于浏览器自动化。 It also has C# and Java APIs (I have no experience using Selenium ). 它还具有C#Java API(我没有使用Selenium经验)。

Facebook loads it's content dynamically with AJAX. Facebook使用AJAX动态加载其内容。 You can use a tool like Firebug to examine what kind of request is made, and then replicate it. 您可以使用Firebug之类的工具检查发出的请求类型,然后将其复制。

Or you can use a browser render engine like webkit to process the JavaScript for you and expose the resulting HTML: http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/ 或者,您可以使用浏览器渲染引擎(如webkit)来为您处理JavaScript并显示最终的HTML: http : //webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM