使用 selenium python 访问网页上的所有链接和子链接

Question

i have a webpage starts with 4 links and each link has 2 or more links and those links also has 1 or 2 links and so on, So how can i visit all links and nested links using selenium and python?我有一个网页以 4 个链接开头，每个链接有 2 个或更多链接，这些链接也有 1 或 2 个链接等等，那么我如何使用 selenium 和 python 访问所有链接和嵌套链接？

all links has same relative xpath所有链接都具有相同的相对 xpath

i have tried below code but it's not working我试过下面的代码，但它不起作用

urls = {}

def visit_children(locator_path):
    children = get_children(locator_path)
    time.sleep(1)
    if children == 1:        
        click_func(locator_path)
        visit_children(locator_path)
    elif children > 1:
        print(children)
        time.sleep(2)
        url = driver.current_url
        print(url)
        urls[url] = children
        print(urls)
        for i in range(children):
            child_elements = driver.find_elements_by_xpath(locator_path)
            child_elements[i].click()
            time.sleep(2)
            visit_children(locator_path)                
    else:
         for link,no_elements in urls.items():
                if urls[driver.current_url] > 0:
                    driver.get(link)
                    time.sleep(1)
                    urls[driver.current_url] -= 1
                    print(urls)
                    time.sleep(2)

Answer 1

I think what you want to do is to implement a crawler, to do that you will need 2 data structures there, one to tell you what urls you have already visited and another where you just dump extract urls based in your criteria.我认为你想要做的是实现一个爬虫，为此你需要 2 个数据结构，一个告诉你你已经访问过哪些 url，另一个你只是根据你的标准转储提取 url。

The crawler function just needs to pop the first URL from the of urls, check if it's a URL that you have already crawled and if not crawl it.爬虫功能只需要从 urls 中弹出第一个 URL，检查它是否是你已经爬过的 URL，如果没有爬取它。

Something like this:像这样的东西：

visited = {}
urls = ['initial_url']  

while len(urls) > 0:
  url = urls.pop()
  if visited[url] == 1:
    continue
  crawl(url)

Notice that set's have a O(1) complexity so you might want to use them to quickly confirm if you have already visited a url or not, while lists (FIFO) while be a awesome way to store the extracted urls.请注意，集合的复杂度为 O(1)，因此您可能希望使用它们来快速确认您是否已经访问过某个 url，而列表 (FIFO) 则是存储提取的 url 的绝佳方式。

使用 selenium python 访问网页上的所有链接和子链接

问题描述

1 个解决方案

解决方案1
1 2020-11-19 07:44:35

使用 selenium python 访问网页上的所有链接和子链接

问题描述

1 个解决方案

解决方案1 1 2020-11-19 07:44:35

解决方案1
1 2020-11-19 07:44:35