简体   繁体   中英

beautifulsoup4 get href from anchor element with specific attribute value

I'm trying to parse the href value from multiple anchor elements on a page that have the attribute itemprop with value url using BeautifulSoup4

For example, extracting /pages/page from <a itemprop="url" href="/pages/page"></a> but there is multiple of these items in one page so I'd like them in an array.

I'm thinking something like this soup("span", html = True, {'itemprop' : 'name' })

Via find_all() you can search your parsed output for a specific tag. In your case it is quite easy. If the first parameter is a string, it'll only find the tags with that name. So soup.find_all("a") will find all anchor tags.

Now it also supports (almost) any keyword argument to further narrow down your selection. In your case you want the attribute itemprop to be set to url so you can do just that with soup.find_all("a", itemprop="url") .

That will now return you a list of tags, and if you want to extract the href attribute from those tags you can use tag.get("href") . The end result will be something like this:

anchored_tags = [tag.get("href") for tag in soup.find_all("a", itemprop="url")]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM