简体   繁体   中英

difficulties while using beautifulsoup

I am trying to scrape some website, I however have some difficulties to collect what I want:

import requests 

from bs4 import BeautifulSoup

import time 

from datetime import date, datetime, timedelta

url = 'https://cerbios.swiss/news-events/news/'

page = requests.get(url)

soup = BeautifulSoup(page.content,'html.parser')

    

results_date = soup.find(class_='entry-title')

print(results_date)

Here is the code that I have, and the output of this code is:

<h3 class="entry-title">

<a href="https://cerbios.swiss/new-400-mhz-nmr-in-cerbios/" rel="bookmark" title="NEW 400 MHZ NMR IN 

CERBIOS">NEW 400 MHZ NMR IN CERBIOS</a>

</h3>

this is good but what I really want is the "href" in order to have in the output just the URL, I really don't know how to do it, I tried this line: results_url = soup.find(class_='entry-tite')['href'] but it does not work since the class 'entry-title' does not have the "href" thing. if anyone can help me it will be a great pleasure.

You're trying to access an href attribute on the <h3> element which does not exist. You can either keep using find() to get to the <a> element or use a more specific selector.

soup.find(class_='entry-title').find('a')['href']

or

soup.select_one('h3.entry-title a')['href']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM