difficulties while using beautifulsoup

Question

I am trying to scrape some website, I however have some difficulties to collect what I want:

import requests 

from bs4 import BeautifulSoup

import time 

from datetime import date, datetime, timedelta

url = 'https://cerbios.swiss/news-events/news/'

page = requests.get(url)

soup = BeautifulSoup(page.content,'html.parser')

    

results_date = soup.find(class_='entry-title')

print(results_date)

Here is the code that I have, and the output of this code is:

<h3 class="entry-title">

<a href="https://cerbios.swiss/new-400-mhz-nmr-in-cerbios/" rel="bookmark" title="NEW 400 MHZ NMR IN 

CERBIOS">NEW 400 MHZ NMR IN CERBIOS</a>

</h3>

this is good but what I really want is the "href" in order to have in the output just the URL, I really don't know how to do it, I tried this line: results_url = soup.find(class_='entry-tite')['href'] but it does not work since the class 'entry-title' does not have the "href" thing. if anyone can help me it will be a great pleasure.

Answer 1

You're trying to access an href attribute on the <h3> element which does not exist. You can either keep using find() to get to the <a> element or use a more specific selector.

soup.find(class_='entry-title').find('a')['href']

or

soup.select_one('h3.entry-title a')['href']

difficulties while using beautifulsoup

Question

1 answers

solution1
2 ACCPTED 2021-04-27 07:45:21

difficulties while using beautifulsoup

Question

1 answers

solution1 2 ACCPTED 2021-04-27 07:45:21

solution1
2 ACCPTED 2021-04-27 07:45:21