简体   繁体   中英

I need to find tags in tags, selenium

I need to find certain links on the page, but there is no class or id in the "a" tags. But there is "span" with classes "ipsContained ipsType_break". I would like it to find all "span" first, and then "a" tags in them. Who knows tell me this or a simpler option

I use selenium, here's a sample html that includes links to fetch.

<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
<script src="./interference.js"></script>
</head>
<body>
  <span class = "ipsContained ipsType_break">
    <a href="link1"></a>
    <a href="link2"></a>
  </span>
  <span class = "ipsContained ipsType_break">
    <a href="link3"></a>
    <a href="link4"></a>
  </span>
</body
</html>

I use BeautifulSoup to parse html.

from bs4 import BeautifulSoup
html = """<!DOCTYPE html>
<html>
<head>
<title>Page Title</title>
<script src="./interference.js"></script>
</head>
<body>
  <span class = "ipsContained ipsType_break">
    <a href="link1"></a>
    <a href="link2"></a>
  </span>
  <span class = "ipsContained ipsType_break">
    <a href="link3"></a>
    <a href="link4"></a>
  </span>
</body
</html>"""
soup = BeautifulSoup(html, 'html.parser')
spans = soup.findAll("span", {"class":"ipsContained ipsType_break"})
links = []
for span in spans:
    aElements = span.findAll("a", href=True)
    for a in aElements:
        links.append(a["href"])
print(links)

Prints: ['link1', 'link2', 'link3', 'link4']

Selenium-based solution:

you can construct an xpath for span tag like this:

//span[@class='ipsContained ipsType_break']

you can store them in a list and then you can get all the child a tags using . and then link using get_attribute method.

Code:

spans =  driver.find_elements(By.XPATH, "//span[@class='ipsContained ipsType_break']")
a_tag_list = []
for span in spans:
    atag = span.find_element(By.XPATH, ".//a")
    print(atag.get_attribute('href'))
    a_tag_list.append(atag.get_attribute('href'))
links=[x.get_attribute("href") for x in driver.find_elements(By.XPATH,"//span[@class='ipsContained ipsType_break']//a")]

Should get you all the href links for every a tag inside those span class. Another approach other then appending to another list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM