简体   繁体   中英

python beautiful soup extract href

so I'm testing Beautiful soup with python (it's great for those how are wondering)

I have a problem when i want to get the href from a link that i got, and i don't understand why I can't get it.

here is my code:

for url in soup.find_all('article'):
if "Gonz Logo" in url.get_text():
    if "Black" in url.get_text():
        print(url)

this works but it gives me this:

<article><div class="inner-article"><a href="/shop/jackets/gw1diqgyr/n53istanq" style="height:150px;"><img alt="N7qmqyee 3g" height="150" src="//assets.supremenewyork.com/147789/vi/N7qMqyEe_3g.jpg" width="150"/></a><h1><a class="name-link" href="/shop/jackets/gw1diqgyr/n53istanq">Gonz Logo Coaches Jacket </a></h1><p><a class="name-link" href="/shop/jackets/gw1diqgyr/n53istanq">Black</a></p></div></article>

(yeah a big line...)

the probleme is i only want to get the href. when I try:

    print(url.get('href'))

I get in output : None

I have no idea why.

thank you for your answers!

I think you get None because of soup.find_all('article') . And when you do url.get('href') you don't get the link.

To get the link I would recommend you to get all a tags using regex, for eg:

links = soup.findAll('a', attrs={'href': re.compile('[a-zA-Z0-9_()]')})
# now iterate over the links and
for link in links:
    # get url
    url = link.get('href')
    print(url)

can you try this?

for url in soup.find_all('article'):
if "Gonz Logo" in url.get_text():
    if "Black" in url.get_text():
        for child_a in url.find_all('a'):
           print(child_a['href'])

By slighly modifying Ali Yilmaz's solution as following (href=True):

for url in soup.find_all('article'):
if "Gonz Logo" in url.get_text():
    if "Black" in url.get_text():
        for child_a in url.find_all('a', href=True):
           print(child_a['href'])

It works fine

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM