I am trying to scrape a medical journal's website ( https://www.edimark.fr/ , for which I am a paid subscriber) for links to PDF articles. Authentication is required to access the articles, and I have been able to login successfully using requests.post
. My problem is that the site appears to be logging me out after requests gets 5 links (I can tell because when I test printing the HTML for each link, after 5 links my account name is replaced by the generic username). When I go to the website manually to log-in again, the site tells me that I cannot have over 5 simultaneous active sessions and that I must close one of these sessions to log in again. Even after I terminate the command, I can still not login for some time.
My question is: is there a way to get around this by restarting/logging out/terminating each session before the loop (shown below) reiterates to avoid being kicked off? Or is there any other solution? The code that I am using (with username and password hidden) is below. You can see that I am iterating the request through 14 links. I would think that running this under with requests.Session() as s:
would have kept me logged in for the entire duration, but that doesn't appear to be the case. I have also included s.get('https://www.edimark.fr/deconnexion')
(the logout url) at the end of the loop, but that doesn't seem to help either.
import requests
from bs4 import BeautifulSoup
import os
import sys
login_url = "https://www.edimark.fr/front/frontlogin/index"
base_url = 'https://www.edimark.fr/resultat-recherche/magazine/14/page:'
payload = "data%5BUser%5D%5Bemail%5D=XXXXX&data%5BUser%5D%5Bpassword%5D=XXXXX&_method=POST"
headers = {
'content-type': "application/x-www-form-urlencoded",
'user-agent': "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.132 Safari/537.36"
}
with requests.Session() as s:
s.post(login_url, data=payload, headers=headers)
for i in range(1, 14):
target_url = base_url +str(i)
response = s.get(target_url, data=payload, headers=headers)
soup = BeautifulSoup(response.content, 'html.parser')
s.get('https://www.edimark.fr/deconnexion')
print soup.text
Please note that I am very new to any sort of coding/programming, so forgive me if I am not using the correct verbiage. I'm sure that the code above isn't the most elegant either. Any suggestions or advice would be much appreciated, preferably in simpler terms if at all possible.
Well, Since i don't have a valid email/pass
to validate my request.
We need to walk through together in order to catch the main issue
Kindly run the following code and let me know if you get AssertionError
or not.
import requests
from bs4 import BeautifulSoup
data = {
'_method': 'POST',
'data[User][email]': 'email@email.com',
'data[User][password]': 'yourpassword'
}
def Login(url):
with requests.Session() as req:
r = req.post(url, data=data, allow_redirects=True)
assert "email@email.com" in r.text
Login("https://www.edimark.fr/front/frontlogin/index")
please replace
email@email.com
andyourpassword
with your credentials.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.