简体   繁体   中英

how to scrape a webpage after a login using python?

I need to scrape facebook for my posts after i logged in, but i don't know how to "keep the connection alive".

I'm using urllib, and i know how to connect to a server, get the page, send the data, but i have no idea on how to handle the cookies to get to the page for which is required a login. i found that i need cookielib to do the job, but cannot find a tutorial or something that explains how to get the job done.

Cn you help me in some way? or provide me with a link to a tutorial?

请不要抓取您的 Facebook 页面,它违反条款和条件,而是使用Graph API ,它允许您注册可以获取帖子的应用程序

To do this, you need to maintain a CookieJar in your application. This library is like a plugin for the Python HTTP client that lets you persist cookies (such as the login token that you are after) across your scraping session.

Note that you might need tospecify a valid user agent for Facebook to accept your request.

Why not use the existing Python for Facebook library? If you're just looking to hook into the API and post / retrieve status messages, I can't see it being that complicated.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM