简体   繁体   中英

How do I login with a web crawler/scraper?

I want to create a program that will scrape my accounts reading lists for several sites, and add them to my Safari reading list. However, I can't just use the normal link to crawl, since it requires login.

How do I get past this?

You might be using HTTP GET messages to load HTML pages. In order to login, HTTP POST message with proper credentials (username and password) will help.

Below is an example of an HTTP POST message to log into a social networking website:

post_message = ('POST /accounts/login/ / HTTP/1.1\r\n'
                'Host: www.fakebook.com\r\n'
                'Connection: keep-alive\r\n'
                'Content-length: ' + contentlength +'\r\n'
                'Origin: http://'+host+'\r\n'
                'User-Agent: User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/46.0.2486.0 Safari/537.36 Edge/13.1058$
                'Content-type: application/x-www-form-urlencoded\r\n'
                'Accept-Encoding: gzip, deflate\r\n'
                'Cookie: csrftoken='+csrftoken+'; sessionid= '+sessionid+'\r\r\n\n')

# Add POST body to header.
post_message += ('username='+username+'&password='+password+'&csrfmiddlewaretoken='+csrftoken+'&next=/fakebook/\r\n\r\n')

You will have to extract the sessionid and csrftoken from the HTML page where you log in to the website.

For more info on HTTP STATUS codes, refer: [ http://www.jmarshall.com/easy/http/][1]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM