简体   繁体   English

csv文件写入数据,但数据被覆盖

[英]Write data in csv file but data are overwritten

Try to scrape the data but data are overwrite and they will give the data of only 2 page in the csv file kindly recommend any solution for that I an waiting for your response How can I fix this?尝试抓取数据,但数据被覆盖,他们将只提供 csv 文件中2 page的数据 请为此推荐任何解决方案 我正在等待您的回复 我该如何解决这个问题? is there any way then suggest me I think due to for loop they overwrite data Thank you.有没有什么办法然后建议我我认为由于循环他们覆盖数据谢谢。 these is the page link https://www.askgamblers.com/online-casinos/countries/ca/这些是页面链接https://www.askgamblers.com/online-casinos/countries/ca/

from selenium import webdriver
import time
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from webdriver_manager.chrome import ChromeDriverManager
from bs4 import BeautifulSoup
import pandas as pd
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from csv import writer


options = webdriver.ChromeOptions()
options.add_argument("--no-sandbox")
options.add_argument("--disable-gpu")
options.add_argument("--window-size=1920x1080")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
wait = WebDriverWait(driver, 20) 
for page in range(1,3):             
    URL = 'https://www.askgamblers.com/online-casinos/countries/ca/{page}'.format(page=page)
    driver.get(URL)
    time.sleep(2)

    urls= []
    data = []
    page_links =driver.find_elements(By.XPATH, "//div[@class='card__desc']//a[starts-with(@href, '/online')]")
    for link in page_links:
        href=link.get_attribute("href")
        urls.append(href)    
       
  
    with open('product.csv', 'w',newline='',encoding='utf-8') as csvfile:
        thewriter=writer(csvfile)
        header=['name','url','website_link','company','rating']
        thewriter.writerow(header)
        
        
        for url in urls:
            driver.get(url)
            time.sleep(1)
            
            try:
                name=driver.find_element(By.CSS_SELECTOR,"h1.review-intro__title").text   
            except:
                pass
            
            try:
                company=driver.find_element(By.XPATH,"//p[span[contains(.,'Company')]]/following-sibling::div").text   
            except:
                pass
            try:
                link=driver.find_element(By.XPATH,"//p[span[contains(.,'Website')]]/following-sibling::div").text   
            except:
                pass
            
            try:
                rate=driver.find_element(By.CSS_SELECTOR,"span.rating-ring__number").text
                
            except:
                pass
            
            jobinfo=[name,url,link,company,rate]
            thewriter.writerow(jobinfo)

You open the same file for (over)writing with 'w' each time but loop over 3 pages.您打开同一个文件,每次都使用'w'进行(覆盖)写入,但循环超过 3 页。 Use a different name or use 'a' (append) instead, but you will get the header three times as well with the current configuration.使用不同的名称或使用'a' (附加)代替,但您将获得 header 三倍以及当前配置。

Better would be to open the file for writing outside the for page loop, write the header, then inside for page write the rows.更好的方法是在for page循环之外打开文件进行写入,写入 header,然后在for page内部写入行。

Basically:基本上:

with open('product.csv', 'w',newline='',encoding='utf-8') as csvfile:
    thewriter=writer(csvfile)
    header=['name','url','website_link','company','rating']
    thewriter.writerow(header)

    for page in range(1,3):             
        ... # compute the row info
        jobinfo=[name,url,link,company,rate]
        thewriter.writerow(jobinfo)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM