from bs4 import BeautifulSoup
import requests
url = "https://www.brightscope.com/ratings"
headers = {'User-Agent':'Mozilla/5.0'}
page = requests.get(url)
soup = BeautifulSoup(page.text, "html.parser")
data = soup.find_all('li',{"class":"more-data"})+soup.findAll('li', {"class":"more-data topten"})
for item in data:
print(item('a'))
I would like to print only the hrefs but I cannot seem to figure this out. I've looked at different videos and can't get it. What am I doing wrong? I know the above code is printing the contents of the "a" tag but I need just the href's.
What you need is to use the dictionary-like access to element's attributes :
[a['href'] for a in item('a')]
And, as a side note, you can improve the way you are locating your li
elements, instead of:
data = soup.find_all('li',{"class":"more-data"})+soup.findAll('li', {"class":"more-data topten"})
for item in data:
print(item('a'))
You can do:
links = soup.select("li.more-data a")
for a in links:
print(a["href"])
where li.more-data a
is a CSS selector which would match all a
elements inside li
elements with more-data
class.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.