简体   繁体   English

如何使用 Beautiful Soup 4 获取链接和标题

[英]How to get link and title using Beautiful Soup 4

html=
"""<div class="slick-list"><div class="slick-track" style="width: 1380px; opacity: 1; transform: translate3d(0px, 0px, 0px);"><div data-index="0" class="slick-slide slick-active slick-current" tabindex="-1" aria-hidden="false" style="outline: none; width: 230px;"><div><div data-courseid="567828" class="course-discovery-unit--card-margin--2TVw4 merchandising-course-card--card--2UfMa"><a href="/course/complete-python-bootcamp/" data-purpose="merchandising-course-card-body-567828" target="_self" class="merchandising-course-card--mask--2-b-d"><div class="merchandising-course-card--card-header--89z8L"><img class="merchandising-course-card--course-image--3G7Kh" alt="" width="240" height="135" src="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg" srcset="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg 1x, https://img-a.udemycdn.com/course/480x270/567828_67d0.jpg 2x"></div><div class="merchandising-course-card--card-body--3OpAH"><div><div class="merchandising-course-card--course-title--2Ob4m" data-purpose="course-card-title">Complete Python Bootcamp: Go from zero to hero in Python 3</div>"""

I want to extract link and title output:我想提取链接和标题 output:

title=Complete Python Bootcamp: Go from zero to hero in Python 3
link=/course/complete-python-bootcamp/

Here is my code:这是我的代码:

data=soup.findAll("div",{"class":"slick-list"})
print(data)

for link in data:
    for a in link.findAll("a"):
        print(a.title,a.href)
from bs4 import BeautifulSoup

html="""<div class="slick-list"><div class="slick-track" style="width: 1380px; opacity: 1; transform: translate3d(0px, 0px, 0px);"><div data-index="0" class="slick-slide slick-active slick-current" tabindex="-1" aria-hidden="false" style="outline: none; width: 230px;"><div><div data-courseid="567828" class="course-discovery-unit--card-margin--2TVw4 merchandising-course-card--card--2UfMa"><a href="/course/complete-python-bootcamp/" data-purpose="merchandising-course-card-body-567828" target="_self" class="merchandising-course-card--mask--2-b-d"><div class="merchandising-course-card--card-header--89z8L"><img class="merchandising-course-card--course-image--3G7Kh" alt="" width="240" height="135" src="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg" srcset="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg 1x, https://img-a.udemycdn.com/course/480x270/567828_67d0.jpg 2x"></div><div class="merchandising-course-card--card-body--3OpAH"><div><div class="merchandising-course-card--course-title--2Ob4m" data-purpose="course-card-title">Complete Python Bootcamp: Go from zero to hero in Python 3</div>"""

soup = BeautifulSoup(html, 'html.parser')

print('title='+soup.find("div",{"data-purpose":"course-card-title"}).text)
print('link='+soup.find("a").get('href'))

I hope this answers your question.我希望这回答了你的问题。

I working solution based on your code (and using findAll ):我根据您的代码(并使用findAll )工作解决方案:

from bs4 import BeautifulSoup

html= """<div class="slick-list"><div class="slick-track" style="width: 1380px; opacity: 1; transform: translate3d(0px, 0px, 0px);"><div data-index="0" class="slick-slide slick-active slick-current" tabindex="-1" aria-hidden="false" style="outline: none; width: 230px;"><div><div data-courseid="567828" class="course-discovery-unit--card-margin--2TVw4 merchandising-course-card--card--2UfMa"><a href="/course/complete-python-bootcamp/" data-purpose="merchandising-course-card-body-567828" target="_self" class="merchandising-course-card--mask--2-b-d"><div class="merchandising-course-card--card-header--89z8L"><img class="merchandising-course-card--course-image--3G7Kh" alt="" width="240" height="135" src="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg" srcset="https://img-a.udemycdn.com/course/240x135/567828_67d0.jpg 1x, https://img-a.udemycdn.com/course/480x270/567828_67d0.jpg 2x"></div><div class="merchandising-course-card--card-body--3OpAH"><div><div class="merchandising-course-card--course-title--2Ob4m" data-purpose="course-card-title">Complete Python Bootcamp: Go from zero to hero in Python 3</div>"""

soup = BeautifulSoup(html, 'html.parser')

data=soup.findAll("div",{"class":"slick-list"})
#print(data)

for div in data:
    for a in div.findAll("a"):
        print(div.text,a.get('href'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM