简体   繁体   中英

Unable to log in to the website with Requests

I'm trying to log in to this website: https://archiwum.polityka.pl/sso/loginform to scrape some articles.

Here is my code:

import requests
from bs4 import BeautifulSoup

login_url = 'https://archiwum.polityka.pl/sso/loginform'
base_url = 'http://archiwum.polityka.pl'

payload = {"username" : XXXXX, "password" : XXXXX}
headers = {"User-Agent" : "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:61.0) Gecko/20100101 Firefox/61.0"}

with requests.Session() as session:

    # Login...
    request = session.get(login_url, headers=headers)
    post = session.post(login_url, data=payload)

    # Now I want to go to the page with a specific article
    article_url = 'https://archiwum.polityka.pl/art/na-kanapie-siedzi-len,393566.html'
    request_article = session.get(article_url, headers=headers)

    # Scrape its content
    soup = BeautifulSoup(request_article.content, 'html.parser')
    content = soup.find('p', {'class' : 'box_text'}).find_next_sibling().text.strip()

    # And print it.
    print(content)

But my output is lik this:

... [pełna treść dostępna dla abonentów Polityki Cyfrowej]

Which means in my native language

... [full content available for subscribers of the Polityka Cyfrowa]

My credentials are correct because I have full access to the content from the browser but not with Requests.

I will be grateful for any suggestions as to how I can do this with Requests. Or do I have to use Selenium for this?

I can help you with the login prodedure. The rest, I suppose, you can manage yourself. Your payload doesn't contain all the necessary information to fetch a valid response. Fill in the two fields username , password from the script below and run the it. I suppose, you will see your name what you see when you are already logged in that webpage.

import requests
from bs4 import BeautifulSoup

payload = {
    'username': 'username here',
    'password': 'your password here',
    'login_success': 'http://archiwum.polityka.pl',
    'login_error': 'http://archiwum.polityka.pl/sso/loginform?return=http%3A%2F%2Farchiwum.polityka.pl'
}
with requests.Session() as session:
    session.headers={"User-Agent":"Mozilla/5.0"}
    page = session.post('https://www.polityka.pl/sso/login', data=payload)
    soup = BeautifulSoup(page.text,"lxml")
    profilename = soup.select_one("#container p span.border").text
    print(profilename)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM