简体   繁体   中英

Find all tags with href attribute using selenium

I want to find all tags with 'href' attribute of a webpage using selenium. But the methods available are tag specific. example: find_element_by_*() I want the code to be tag independent. ie get all tags that have a href attribute. Any help would be appreciated!

You can use XPath or CSS Selectors.

>>> driver.get('https://www.python.org')
>>>
>>> len(driver.find_elements_by_css_selector('[href]'))
241
>>> len(driver.find_elements_by_xpath('//*[@href]'))
241

https://selenium-python.readthedocs.io/locating-elements.html#locating-by-xpath

https://selenium-python.readthedocs.io/locating-elements.html#locating-elements-by-css-selectors

Try this:

from selenium import webdriver

driver = webdriver.Chrome()
driver.get('https://google.com')
elements = driver.find_elements_by_xpath('//*[@href]')
all_tags = [el.tag_name for el in elements]
print(all_tags)
driver.quit()

I will reference this post: https://stackoverflow.com/a/27307235/15303240

It is not possible using a selenium webdriver API, but you can execute a javascript code to get all attributes :

driver.execute_script('var items = {}; for (index = 0; index < arguments[0].attributes.length; ++index) { items[arguments[0].attributes[index].name] = arguments[0].attributes[index].value }; return items;', element)

Demo:

>>> from selenium import webdriver
>>> from pprint import pprint
>>> driver = webdriver.Firefox()
>>> driver.get('https://stackoverflow.com')
>>> 
>>> element = driver.find_element_by_xpath('//div[@class="network-items"]/a')
>>> attrs = driver.execute_script('var items = {}; for (index = 0; index < arguments[0].attributes.length; ++index) { items[arguments[0].attributes[index].name] = arguments[0].attributes[index].value }; return items;', element)
>>> pprint(attrs)
{u'class': u'topbar-icon icon-site-switcher yes-hover js-site-switcher-button js-gps-track',
 u'data-gps-track': u'site_switcher.show',
 u'href': u'//stackexchange.com',
 u'title': u'A list of all 132 Stack Exchange sites'}

For completeness sake, an alternative solution would be to get the tag's outerHTML and parse the attributes using an HTML parser. Example (using BeautifulSoup ):

>>> from bs4 import BeautifulSoup
>>> html = element.get_attribute('outerHTML')
>>> attrs = BeautifulSoup(html, 'html.parser').a.attrs
>>> print(attrs)
{u'class': [u'topbar-icon',
            u'icon-site-switcher',
            u'yes-hover',
            u'js-site-switcher-button',
            u'js-gps-track'],
 u'data-gps-track': u'site_switcher.show',
 u'href': u'//stackexchange.com',
 u'title': u'A list of all 132 Stack Exchange sites'}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM