简体   繁体   中英

How do I get certain links from a website, but not all of them?

Here's what I have so far:

import requests
from bs4 import BeautifulSoup

def linkScraper():
    html = requests.get("https://www.bbc.com/").text
    soup = BeautifulSoup(html, 'html.parser')
    
    for link in soup.find_all('a'):
        print(link.get('href'))

But this prints every single link on the website. How can I configure this to give me the links to the articles that appear on the BBC's homepage?

您可以使用列表理解对其进行过滤:

links = [link for link in soup.find_all('a') if link.startswith('https://www.bbc.com/')]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM