so I'm testing Beautiful soup with python (it's great for those how are wondering)
I have a problem when i want to get the href from a link that i got, and i don't understand why I can't get it.
here is my code:
for url in soup.find_all('article'):
if "Gonz Logo" in url.get_text():
if "Black" in url.get_text():
print(url)
this works but it gives me this:
<article><div class="inner-article"><a href="/shop/jackets/gw1diqgyr/n53istanq" style="height:150px;"><img alt="N7qmqyee 3g" height="150" src="//assets.supremenewyork.com/147789/vi/N7qMqyEe_3g.jpg" width="150"/></a><h1><a class="name-link" href="/shop/jackets/gw1diqgyr/n53istanq">Gonz Logo Coaches Jacket </a></h1><p><a class="name-link" href="/shop/jackets/gw1diqgyr/n53istanq">Black</a></p></div></article>
(yeah a big line...)
the probleme is i only want to get the href. when I try:
print(url.get('href'))
I get in output : None
I have no idea why.
thank you for your answers!
I think you get None because of soup.find_all('article')
. And when you do url.get('href')
you don't get the link.
To get the link I would recommend you to get all a
tags using regex, for eg:
links = soup.findAll('a', attrs={'href': re.compile('[a-zA-Z0-9_()]')})
# now iterate over the links and
for link in links:
# get url
url = link.get('href')
print(url)
can you try this?
for url in soup.find_all('article'):
if "Gonz Logo" in url.get_text():
if "Black" in url.get_text():
for child_a in url.find_all('a'):
print(child_a['href'])
By slighly modifying Ali Yilmaz's solution as following (href=True):
for url in soup.find_all('article'):
if "Gonz Logo" in url.get_text():
if "Black" in url.get_text():
for child_a in url.find_all('a', href=True):
print(child_a['href'])
It works fine
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.