简体   繁体   中英

Python > Selenium + CSV: How to open links from list in .csv file, loop code, append data on csv?

Am currently building a scraper, managed to the code going, but need some coding assistance / tutorial recommendations for the following program:

1) A list of links in .csv format to open with webdriver

2) To run the same scraping code for all the links in the list

3) To append the output into a .csv file

The basic structure of the code:

 from selenium import webdriver
    import time
    import csv
    from selenium.webdriver.common.keys import Keys

    driver = webdriver.Chrome

    #driver.get("...link from csv file..."), e.g. with open('links.csv', 'r') as file:   etc...
    time.sleep(5)

    elements = driver.find_elements_by_class_name('data-xl')
    csvfile = "output.csv";

with open(csvfile, "w", newline="") as output:
    writer = csv.writer(output)
    writer.writerow(["Reads", "Average Time Spent", "Impressions", "Read Time", "Likes", "Publication Shares", "Times Stacked", "Link-Outs"])
    column headers

driver.quit()

The problems:

1) How to open the .csv using Python Selenium, and go to links in succession, row by row (1, +1, +1...)

2) To loop code for all links visited at step (1) and, even with errors (eg "element not found" etc.), can proceed to the next item on the .csv

3) Create headers in .csv (note: code structure above is inaccurate)

4) Print output into .csv by way of appending, and without overlap

Any hints on how to achieve the steps mentioned above would be helpful

First you have to edit driver = webdriver.Chrome to driver = webdriver.Chrome()

and

Here is full code.

from selenium import webdriver
import time
import csv

#link.csv below
# https://google.com
# https://google.com
# https://google.com

driver = webdriver.Chrome()

f = open('link.csv', 'r', encoding='utf-8')
reader = csv.reader(f)
w = open('output.csv', 'w', newline="", encoding="utf-8")
writer = csv.writer(w)

for line in reader:
    driver.get(line[0])
    time.sleep(5)
    elements = driver.find_element_by_xpath('//img[@alt="Google"]')
    writer.writerow(
             ["Reads", "Average Time Spent", "Impressions", "Read Time", "Likes", "Publication Shares", "Times Stacked",
              "Link-Outs"])

f.close()
w.close()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM