如何使用 webdriver 将来自多个页面的数据保存到单个 csv

Question

so i'm trying to save data from googlescholar using selenium (webdriver) and so far i can print the data that i want, but i when i saved it into a csv it only saves the first page所以我正在尝试使用selenium（webdriver）从googlescholar保存数据，到目前为止我可以打印我想要的数据，但是当我将它保存到csv时它只保存第一页

from selenium import webdriver
from selenium.webdriver.common.by import By
# Import statements for explicit wait
from selenium.webdriver.support.ui import WebDriverWait as W
from selenium.webdriver.support import expected_conditions as EC
import time
import csv
from csv import writer

exec_path = r"C:\Users\gvste\Desktop\proyecto\chromedriver.exe"
URL = r"https://scholar.google.com/citations?view_op=view_org&hl=en&authuser=2&org=8337597745079551909"

button_locators = ['//*[@id="gsc_authors_bottom_pag"]/div/button[2]', '//*[@id="gsc_authors_bottom_pag"]/div/button[2]','//*[@id="gsc_authors_bottom_pag"]/div/button[2]']
wait_time = 3
driver = webdriver.Chrome(executable_path=exec_path)
driver.get(URL)
wait = W(driver, wait_time)
#driver.maximize_window()
for j in range(len(button_locators)):
    button_link = wait.until(EC.element_to_be_clickable((By.XPATH, button_locators[j])))

address = driver.find_elements_by_class_name("gsc_1usr")

    #for post in address:
        #print(post.text)
time.sleep(4)

with open('post.csv','a') as s:
    for i in range(len(address)):

        addresst = address
            #if addresst == 'NONE':
            #   addresst = str(address)
            #else:
        addresst = address[i].text.replace('\n',',')
        s.write(addresst+ '\n')

button_link.click()
time.sleep(4)

    #driver.quit()

Answer 1

You only get one first page data because your program stops after it clicks next page button.您只会获得一个首页数据，因为您的程序在单击下一页按钮后停止。 You have to put all that in a for loop.你必须把所有这些都放在一个 for 循环中。

Notice i wrote in range(7), because I know there are 7 pages to open, in reality we should never do that.注意我在 range(7) 中写的，因为我知道有 7 页要打开，实际上我们不应该这样做。 Imagine if we have thousands of pages.想象一下，如果我们有数千页。 We should add some logic to check if the "next page button" exists or something and loop until it doesn't我们应该添加一些逻辑来检查“下一页按钮”是否存在或其他东西并循环直到它不存在

exec_path = r"C:\Users\gvste\Desktop\proyecto\chromedriver.exe"
URL = r"https://scholar.google.com/citations?view_op=view_org&hl=en&authuser=2&org=8337597745079551909"

button_locators = "/html/body/div/div[8]/div[2]/div/div[12]/div/button[2]"
wait_time = 3
driver = webdriver.Chrome(executable_path=exec_path)
driver.get(URL)
wait = W(driver, wait_time)

time.sleep(4)

# 7 pages. In reality, we should get this number programmatically 
for page in range(7):

    # read data from new page
    address = driver.find_elements_by_class_name("gsc_1usr")

    # write to file
    with open('post.csv','a') as s:
        for i in range(len(address)):
            addresst = address[i].text.replace('\n',',')
            s.write(addresst+ '\n')

    # find and click next page button
    button_link = wait.until(EC.element_to_be_clickable((By.XPATH, button_locators)))
    button_link.click()
    time.sleep(4)

also in the future you should look to change all these time.sleeps to wait.until .同样在将来，您应该将所有这些time.sleeps更改为wait.until 。 Because sometimes your page loads quicker, and the program could do it's job faster.因为有时您的页面加载速度更快，而程序可以更快地完成它的工作。 Or even worse, your network might get a lag and that would screw up your script.或者更糟糕的是，您的网络可能会出现延迟，这会破坏您的脚本。

如何使用 webdriver 将来自多个页面的数据保存到单个 csv

问题描述

1 个解决方案

解决方案1
0 已采纳 2020-05-21 23:38:58

如何使用 webdriver 将来自多个页面的数据保存到单个 csv

问题描述

1 个解决方案

解决方案1 0 已采纳 2020-05-21 23:38:58

解决方案1
0 已采纳 2020-05-21 23:38:58