简体   繁体   中英

Beautifulsoup requests.post not scraping correctly

I am attempting to scrape a site that requires a login:

login_url = 'https://www.spotrac.com/signin/'
data = {
        'email': 'a****@gmail.com',
        'password': '******'
}

with requests.Session() as s:
    response = requests.post(login_url , data)
    index_page= s.get('https://www.spotrac.com/nba/contracts/breakdown/2010/')
    soup = BeautifulSoup(index_page.text, 'html.parser')

This code will scrape the page, but only as if you hadn't logged in - ie there is none of the data being returned that you would expect with an accurate login.

Where am I going wrong here?

I think you are sending your username and password to the wrong URL. https://www.spotrac.com/signin is the page that shows the login fields, but https://www.spotrac.com/signin/submit/ is the page your credentials get sent to when you click submit.

I was not able to test this code because I don't want to pay $30

login_url = 'https://www.spotrac.com/signin/submit/'
data = {
        'email': 'a****@gmail.com',
        'password': '******'
}

with requests.session() as s:
    response = requests.post(login_url , data)
    index_page= s.get('https://www.spotrac.com/nba/contracts/breakdown/2010/')
    soup = BeautifulSoup(index_page.text, 'html.parser')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM