使用静态网址抓取多个页面

Question

我曾问过一个类似的问题，关于从https://ethnicelebs.com/all-celeb用静态URL导航多个页面，感谢您的帮助！ 但是现在，我想通过单击每个名称来抓取列出的每个字符的所有种族信息。 我现在可以浏览所有页面，但是我的代码始终从第一页开始抓取信息。

我尝试了以下方法：

url = 'https://ethnicelebs.com/all-celeb'
driver = webdriver.Chrome()
driver.get(url)
while True:

    page = requests.post('https://ethnicelebs.com/all-celebs')
    soup = BeautifulSoup(page.text, 'html.parser')
    for href in soup.find_all('a', href=True)[18:]:
        print('Found the URL:{}'.format(href['href']))
        request_href = requests.get(href['href'])
        soup2 = BeautifulSoup(request_href.content)
        for each in soup2.find_all('strong')[:-1]:
            print(each.text)

    Next_button = (By.XPATH, "//*[@title='Go to next page']")
    WebDriverWait(driver, 50).until(EC.element_to_be_clickable(Next_button)).click()
    url = driver.current_url
    time.sleep(5)

（感谢@Sureshmani！）

我希望代码在导航时会抓取每个页面，而不仅仅是第一页。 在不断浏览的同时，如何抓取当前页面？ 谢谢！

Answer 1

由于先前答案中的嵌套循环，我误解了您的问题。 以下代码将起作用：

url = 'https://ethnicelebs.com/all-celeb'
driver = webdriver.Chrome()
while True:
    driver.get(url)
    soup = BeautifulSoup(driver.page_source, 'html.parser')
    for href in soup.find_all('a', href=True)[18:]:
        print('Found the URL:{}'.format(href['href']))
        driver.get(href['href'])
        soup2 = BeautifulSoup(driver.page_source)
        for each in soup2.find_all('strong')[:-1]:
            print(each.text)

    Next_button = (By.XPATH, "//*[@title='Go to next page']")
    WebDriverWait(driver, 50).until(EC.element_to_be_clickable(Next_button)).click()
    url = driver.current_url
    time.sleep(5)

在您的代码中，您仅在开始时通过硒发送一个请求，然后在以后使用该requests 。 要同时导航和抓取页面，应仅使用硒，如上例所示。

使用静态网址抓取多个页面

问题描述

1 个解决方案

解决方案1
0 2019-07-26 02:09:06

使用静态网址抓取多个页面

问题描述

1 个解决方案

解决方案1 0 2019-07-26 02:09:06

解决方案1
0 2019-07-26 02:09:06