简体   繁体   中英

Authentication while scraping via BeautifulSoup in python

I have created a piece of code to scrape an article off the ft.com website.

url = ""
r = requests.get(url)
soup = bs4.BeautifulSoup(r.content, "html.parser")
for a in soup.find_all('div', {"id":"storyContent"}):
    print a

1) On the website, there is a div tag with id:storyContent but I get no output as a result of this code which means that it didn' enter the loop at all! What might the reason be?
Now ft.com does not give access to articles without entering username and password.
I have logged into ft.com using chrome.
Suppose my username, password details are the following:
Username : bs@sb.com
Pass: 12345
I need to know either of the following:
2) How can I provide this authentication in my code?
3) How can I use the session on chrome (on which I'm already logged in) to acces the webpage/article details.
4) If authentication is the resson behind no output!
5) I am trying to get the article's body out of the webpage.
Thanks!

Rather start with this.

url = "http://www.ft.com"
r = requests.get(url)
soup = BeautifulSoup(r.content, "html.parser")
for a in soup:
    print a

Then add a requests when you find the key:value pair required

r = requests.post('http://www.ft.com/xxx', data = {'key':'value'})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM