简体   繁体   中英

Scrapy Web Scraping and Facebook

Any thoughts on why i can't login? I've been trying to login via facebook and linkedin using the same method; no success. I'm using the most recent version of Scrapy. I am trying to get to 'Messages' to test, but I know it doesn't work because it redirects me back to the login page... same thing on LinkedIn.

import scrapy
from scrapy.spiders import BaseSpider
from scrapy.http import FormRequest
from scrapy.contrib.spiders import CrawlSpider
from linkedIn.items import LinkedinItem
from scrapy.http import Request
#from spider.settings import JsonWriterPipeline

class MySpider (CrawlSpider):
    name = 'fb'
    allowed_domains = ['facebook.com']
    start_urls = ['https://login.facebook.com/login.php']

def parse(self, response):
    return [FormRequest.from_response(response,
                formname='login_form',
                formdata={'email':'my_email@example.com',
                          'pass':'test!'},
                callback=self.after_login)]
def after_login(self, response):
    # check login succeed before going on
    if "the password you entered is incorrect" in response.body:
        self.log("\n\n\n\nLogin failed\n\n\n\n", level=self.log())
        return
    else:
        self.log("\n\n\n Login was successful!!!\n\n\n")
        self.log(response.body)
        return Request(url="https://facebook.com/messages",
               callback=self.parse_items)

def parse_items(self,response):
    hxs = scrapy.Selector(response)
    titles =hxs.xpath("//title")
    items = []
    for title in titles:
        item = LinkedinItem()
        item['friendName']= titles.xpath("//title").extract()
        #item['numberOffriends']= titles.select("some path here").extract().pop()    
        items.append(item)
    return (items)

Both Facebook and Linkedin use CSRF tokens. You have to first GET the page with the login form, then parse the HTML and get the CSRF token and then lastly make a POST request with username/password and CSRF token.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM