简体   繁体   中英

Retrieve DOM data from site

Is there any chance to retrieve DOM results when I click older posts from the site:

http://www.facebook.com/FamilyGuy

using C# or Java? I heard that it is possible to execute a script with onclick and get results. How I can execute this script:

onclick="(JSCC.get('j4eb9ad57ab8a19f468880561') && JSCC.get('j4eb9ad57ab8a19f468880561').getHandler())(); return false;"

I think older posts link sends an Ajax request and appends the response to the page. (I'm not sure. You should check the page source).

You can emulate this behavior in C# , Java , and JavaScript (you already have the code for javascript).

Edit:

It seems that Facebook uses some sort of internal APIs ( JSCC ) to load the content and it's undocumented.

I don't know about Facebook Developers' APIs (you may want to check that first) but if you want to emulate exactly what happens in your browser then you can use TamperData to intercept GET requests when you click on more posts link and find the request URL and it's parameters .

After you get this information you have to Login to your account in your application and get the authentication cookie.

C# sample code as you requested:

private CookieContainer GetCookieContainer(string loginURL, string userName, string password)
{
    var webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
    var responseReader = new StreamReader(webRequest.GetResponse().GetResponseStream());
    string responseData = responseReader.ReadToEnd();
    responseReader.Close();

    // Now you may need to extract some values from the login form and build the POST data with your username and password.
    // I don't know what exactly you need to POST but again a TamperData observation will help you to find out.
    string postData =String.Format("UserName={0}&Password={1}", userName, password); // I emphasize that this is just an example.

    // cookie container
    var cookies = new CookieContainer();

    // post the login form
    webRequest = WebRequest.Create(loginURL) as HttpWebRequest;
    webRequest.Method = "POST";
    webRequest.ContentType = "application/x-www-form-urlencoded";
    webRequest.CookieContainer = cookies;

    // write the form values into the request message
    var requestWriter = new StreamWriter(webRequest.GetRequestStream());
    requestWriter.Write(postData);
    requestWriter.Close();

    webRequest.GetResponse().Close();
    return cookies;
}

Then you can perform GET requests with the cookie you have, on the URL you've got from analyzing that JSCC.get().getHandler() requests using TamperData , and eventually you'll get what you want as a response stream:

var webRequest = WebRequest.Create(url) as HttpWebRequest;
webRequest.CookieContainer = GetCookieContainer(url, userName, password);
var responseStream = webRequest.GetResponse().GetResponseStream();

You can also use Selenium for browser automation. It also has C# and Java APIs (I have no experience using Selenium ).

Facebook loads it's content dynamically with AJAX. You can use a tool like Firebug to examine what kind of request is made, and then replicate it.

Or you can use a browser render engine like webkit to process the JavaScript for you and expose the resulting HTML: http://webscraping.com/blog/Scraping-JavaScript-webpages-with-webkit/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM