Am currently building a scraper, managed to the code going, but need some coding assistance / tutorial recommendations for the following program:
1) A list of links in .csv format to open with webdriver
2) To run the same scraping code for all the links in the list
3) To append the output into a .csv file
The basic structure of the code:
from selenium import webdriver
import time
import csv
from selenium.webdriver.common.keys import Keys
driver = webdriver.Chrome
#driver.get("...link from csv file..."), e.g. with open('links.csv', 'r') as file: etc...
time.sleep(5)
elements = driver.find_elements_by_class_name('data-xl')
csvfile = "output.csv";
with open(csvfile, "w", newline="") as output:
writer = csv.writer(output)
writer.writerow(["Reads", "Average Time Spent", "Impressions", "Read Time", "Likes", "Publication Shares", "Times Stacked", "Link-Outs"])
column headers
driver.quit()
The problems:
1) How to open the .csv using Python Selenium, and go to links in succession, row by row (1, +1, +1...)
2) To loop code for all links visited at step (1) and, even with errors (eg "element not found" etc.), can proceed to the next item on the .csv
3) Create headers in .csv (note: code structure above is inaccurate)
4) Print output into .csv by way of appending, and without overlap
Any hints on how to achieve the steps mentioned above would be helpful
First you have to edit driver = webdriver.Chrome
to driver = webdriver.Chrome()
and
Here is full code.
from selenium import webdriver
import time
import csv
#link.csv below
# https://google.com
# https://google.com
# https://google.com
driver = webdriver.Chrome()
f = open('link.csv', 'r', encoding='utf-8')
reader = csv.reader(f)
w = open('output.csv', 'w', newline="", encoding="utf-8")
writer = csv.writer(w)
for line in reader:
driver.get(line[0])
time.sleep(5)
elements = driver.find_element_by_xpath('//img[@alt="Google"]')
writer.writerow(
["Reads", "Average Time Spent", "Impressions", "Read Time", "Likes", "Publication Shares", "Times Stacked",
"Link-Outs"])
f.close()
w.close()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.