简体   繁体   中英

How to Login and Scrape Websites with Python?

I understand there are similar questions out there, however, I couldn't make this code to work out. Does anyone know how to login and scrape the data from this website?

from bs4 import BeautifulSoup
import requests

# Start the session
session = requests.Session()

# Create the payload
payload = {'login':<USERNAME>, 
          'password':<PASSWORD>
         }

# Post the payload to the site to log in
s = session.post("https://www.beeradvocate.com/community/login", data=payload)

# Navigate to the next page and scrape the data
s = session.get('https://www.beeradvocate.com/place/list/?c_id=AR&s_id=0&brewery=Y')

soup = BeautifulSoup(s.text, 'html.parser')
soup.find('div', class_='titleBar')
print(soup)

The process is different for almost each site, the best way to know how to do it is to use your browser's request inspector (firefox) and look at how the site behaves when you try to login.

For your website, when you click the login button a post request is sent to https://www.beeradvocate.com/community/login/login , with a little bit of trial and error your should be able to replicate it.

Make sure you match the content-type and request headers (specifically cookies in case you need auth tokens).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM