[英]Scrapy Web Scraping and Facebook
Any thoughts on why i can't login?关于为什么我无法登录的任何想法? I've been trying to login via facebook and linkedin using the same method;
我一直在尝试使用相同的方法通过 facebook 和linkedin 登录; no success.
没有成功。 I'm using the most recent version of Scrapy.
我正在使用最新版本的 Scrapy。 I am trying to get to 'Messages' to test, but I know it doesn't work because it redirects me back to the login page... same thing on LinkedIn.
我正在尝试使用“消息”进行测试,但我知道它不起作用,因为它会将我重定向回登录页面......在 LinkedIn 上也是如此。
import scrapy
from scrapy.spiders import BaseSpider
from scrapy.http import FormRequest
from scrapy.contrib.spiders import CrawlSpider
from linkedIn.items import LinkedinItem
from scrapy.http import Request
#from spider.settings import JsonWriterPipeline
class MySpider (CrawlSpider):
name = 'fb'
allowed_domains = ['facebook.com']
start_urls = ['https://login.facebook.com/login.php']
def parse(self, response):
return [FormRequest.from_response(response,
formname='login_form',
formdata={'email':'my_email@example.com',
'pass':'test!'},
callback=self.after_login)]
def after_login(self, response):
# check login succeed before going on
if "the password you entered is incorrect" in response.body:
self.log("\n\n\n\nLogin failed\n\n\n\n", level=self.log())
return
else:
self.log("\n\n\n Login was successful!!!\n\n\n")
self.log(response.body)
return Request(url="https://facebook.com/messages",
callback=self.parse_items)
def parse_items(self,response):
hxs = scrapy.Selector(response)
titles =hxs.xpath("//title")
items = []
for title in titles:
item = LinkedinItem()
item['friendName']= titles.xpath("//title").extract()
#item['numberOffriends']= titles.select("some path here").extract().pop()
items.append(item)
return (items)
Both Facebook and Linkedin use CSRF tokens. Facebook 和 Linkedin 都使用 CSRF 令牌。 You have to first GET the page with the login form, then parse the HTML and get the CSRF token and then lastly make a POST request with username/password and CSRF token.
您必须首先使用登录表单获取页面,然后解析 HTML 并获取 CSRF 令牌,然后最后使用用户名/密码和 CSRF 令牌发出 POST 请求。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.