[英]How do I get certain links from a website, but not all of them?
Here's what I have so far:这是我到目前为止所拥有的:
import requests
from bs4 import BeautifulSoup
def linkScraper():
html = requests.get("https://www.bbc.com/").text
soup = BeautifulSoup(html, 'html.parser')
for link in soup.find_all('a'):
print(link.get('href'))
But this prints every single link on the website.但这会打印网站上的每个链接。 How can I configure this to give me the links to the articles that appear on the BBC's homepage?我如何配置它以提供指向出现在 BBC 主页上的文章的链接?
您可以使用列表理解对其进行过滤:
links = [link for link in soup.find_all('a') if link.startswith('https://www.bbc.com/')]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.