简体   繁体   中英

Avoiding simultaneous sessions with requests.Session()?

I am trying to scrape a medical journal's website ( https://www.edimark.fr/ , for which I am a paid subscriber) for links to PDF articles. Authentication is required to access the articles, and I have been able to login successfully using requests.post . My problem is that the site appears to be logging me out after requests gets 5 links (I can tell because when I test printing the HTML for each link, after 5 links my account name is replaced by the generic username). When I go to the website manually to log-in again, the site tells me that I cannot have over 5 simultaneous active sessions and that I must close one of these sessions to log in again. Even after I terminate the command, I can still not login for some time.

My question is: is there a way to get around this by restarting/logging out/terminating each session before the loop (shown below) reiterates to avoid being kicked off? Or is there any other solution? The code that I am using (with username and password hidden) is below. You can see that I am iterating the request through 14 links. I would think that running this under with requests.Session() as s: would have kept me logged in for the entire duration, but that doesn't appear to be the case. I have also included s.get('https://www.edimark.fr/deconnexion') (the logout url) at the end of the loop, but that doesn't seem to help either.

import requests
from bs4 import BeautifulSoup
import os
import sys

login_url = "https://www.edimark.fr/front/frontlogin/index"
base_url = 'https://www.edimark.fr/resultat-recherche/magazine/14/page:'


payload = "data%5BUser%5D%5Bemail%5D=XXXXX&data%5BUser%5D%5Bpassword%5D=XXXXX&_method=POST"
headers = {
    'content-type': "application/x-www-form-urlencoded",
    'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36"
    }

with requests.Session() as s:
s.post(login_url, data=payload, headers=headers)
for i in range(1, 14):
    target_url = base_url +str(i)
    response = s.get(target_url, data=payload, headers=headers)
    soup = BeautifulSoup(response.content, 'html.parser')
    s.get('https://www.edimark.fr/deconnexion')
    print soup.text

Please note that I am very new to any sort of coding/programming, so forgive me if I am not using the correct verbiage. I'm sure that the code above isn't the most elegant either. Any suggestions or advice would be much appreciated, preferably in simpler terms if at all possible.

Well, Since i don't have a valid email/pass to validate my request.

We need to walk through together in order to catch the main issue

Kindly run the following code and let me know if you get AssertionError or not.

import requests
from bs4 import BeautifulSoup

data = {
    '_method': 'POST',
    'data[User][email]': 'email@email.com',
    'data[User][password]': 'yourpassword'
}


def Login(url):
    with requests.Session() as req:
        r = req.post(url, data=data, allow_redirects=True)
        assert "email@email.com" in r.text


Login("https://www.edimark.fr/front/frontlogin/index")

please replace email@email.com and yourpassword with your credentials.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM