简体   繁体   English

为什么我的程序在连接后没有保持登录到网站?

[英]Why doesn't my program stay logged into a website after connecting?

I am trying to scrape information about applicants for jobs but after s.post(login_url, data=payload) the session gets reset and the program no longer has access to the website content.我正在尝试抓取有关求职者的信息,但在s.post(login_url, data=payload)之后 session 被重置并且程序不再可以访问网站内容。 I have tested it with just the logging in and it works fine, but when I try to access interviews_url = ('https://www.sparkhire.com/company/interviews') it logs me out.我仅通过登录对其进行了测试,并且运行良好,但是当我尝试访问interviews_url = ('https://www.sparkhire.com/company/interviews')时,它会将我注销。 Am I doing something wrong?难道我做错了什么?

from bs4 import BeautifulSoup
import requests
import token_scraper

login_url = ('https://www.sparkhire.com/login')
interviews_url = ('https://www.sparkhire.com/company/interviews')
payload = {
    '_token':token_scraper.token,
    'email':'censored', 
    'password':'censored'
}

with requests.session() as s:
    s.post(login_url, data=payload)
    r = s.get(interviews_url)
    soup = BeautifulSoup(s.content, 'html.parser')
    print(soup)
from bs4 import BeautifulSoup
import requests

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:94.0) Gecko/20100101 Firefox/94.0'
}


def get_soup(content):
    return BeautifulSoup(content, 'lxml')


def main(url):
    with requests.Session() as req:
        req.headers.update(headers)
        r = req.get(url)
        soup = get_soup(r.text)
        data = {
            "_token": soup.select_one('input[name=_token]')['value'],
            "email": "any@any.com",
            "password": "yourpass"
        }
        req.post(url, data=data)
        r = req.get('https://www.sparkhire.com/company/interviews')
        with open('view.html', 'wb') as f:
            f.write(r.content)


main('https://www.sparkhire.com/login')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM