简体   繁体   English

Python selenium web 抓取数据到 Z628CB5675FF524F3E719B7AA2E88FE3

[英]Python selenium web scraped data to csv export

So i am working on a custom web scraper for any kind of ecommerce site, i want it to scrape names and prices of listings on a site and then export them to csv, but the problem is it exports only one line of (name, price) and it prints it on every line of csv, i couldnt find a good solution for this, i hope im not asking an extremely stupid thing, although i think the fix is easy.所以我正在为任何类型的电子商务网站开发一个定制的 web 刮板,我希望它在网站上抓取列表的名称和价格,然后将它们导出到 csv,但问题是它只导出一行(名称,价格) 并且它打印在 csv 的每一行上,我找不到一个好的解决方案,我希望我不要问一个非常愚蠢的事情,虽然我认为修复很容易。 I hope someone will read my code and help me, thank you !我希望有人会阅读我的代码并帮助我,谢谢!

###imports
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import csv
import pandas as pd


#driver path
driver = webdriver.Firefox(executable_path="D:\Programy\geckoDriver\geckodriver.exe")

#init + search
driver.get("https://pc.bazos.sk/pc/")
time.sleep(1)
nazov = driver.find_element_by_name("hledat")
nazov.send_keys("xeon")
cenamin = driver.find_element_by_name("cenaod")
cenamin.send_keys("")
cenamax = driver.find_element_by_name("cenado")
cenamax.send_keys("300")
driver.find_element_by_name("Submit").click()

##cookie acceptor
driver.find_element_by_xpath("/html/body/div[1]/button").click()

##main
x = 3
for i in range(x):
    try:
        main = WebDriverWait(driver, 7).until(
            EC.presence_of_element_located((By.XPATH, "/html/body/div[1]/table/tbody/tr/td[2]"))
        )
        
        ##find listings in table
        inzeraty = main.find_elements_by_class_name("vypis")
        for vypis in inzeraty:
            nadpis = vypis.find_element_by_class_name("nadpis")    
            ##print listings to check correctness
            nadpist = nadpis.text
            print(nadpist)
        
        ##find the price and print 
        for vypis in inzeraty:
            cena = vypis.find_element_by_class_name("cena")
            cenat = cena.text
            print(cenat)
        
        ##export to csv - not working
        time.sleep(1)
        print("Writing to csv")
        d = {"Nazov": [nadpist]*20*x,"Cena": [cenat]*20*x}
        df = pd.DataFrame(data=d)
        df.to_csv("bobo.csv")
        time.sleep(1)
        print("Writing to csv done !")
        
        ##next page
        dalsia = driver.find_element_by_link_text("Ďalšia")
        dalsia.click()
    except:
        driver.quit()

i want the csv to look like:我希望 csv 看起来像:

  1. name,price名称、价格
  2. name2, price2 it would be great is the csv had only two columns and x rows depending on the number of listings name2, price2 会很棒的是 csv 只有两列和 x 行,具体取决于列表的数量
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd

#driver path
driver = webdriver.Chrome()

#init + search
driver.get("https://pc.bazos.sk/pc/")
time.sleep(1)
nazov = driver.find_element_by_name("hledat")
nazov.send_keys("xeon")
cenamin = driver.find_element_by_name("cenaod")
cenamin.send_keys("")
cenamax = driver.find_element_by_name("cenado")
cenamax.send_keys("300")
driver.find_element_by_name("Submit").click()

##cookie acceptor
time.sleep(10)
driver.find_element_by_xpath("/html/body/div[1]/button").click()

##main
x = 3
d = []
for i in range(x):
    try:
        main = WebDriverWait(driver, 7).until(
            EC.presence_of_element_located(
                (By.XPATH, "/html/body/div[1]/table/tbody/tr/td[2]")))

        ##find listings in table
        inzeraty = main.find_elements_by_class_name("vypis")
        for vypis in inzeraty:
            d.append({"Nazov": vypis.find_element_by_class_name("nadpis").text,
            "Cena": vypis.find_element_by_class_name("cena").text
                })

        ##next page
        dalsia = driver.find_element_by_link_text("Ďalšia")
        dalsia.click()
    except:
        driver.quit()

time.sleep(1)
print("Writing to csv")
df = pd.DataFrame(data=d)
df.to_csv("bobo.csv",index=False)

this gives me 59 items with price.这给了我 59 件商品的价格。 first added to dict then to list, then send that to pandas.首先添加到 dict 然后列表,然后将其发送到 pandas。

All you need to do is create two empty lists nadpist_l , cenat_l and append data to that lists, finally save the lists as a dataframe.您需要做的就是创建两个空列表nadpist_lcenat_l和 append 数据到该列表,最后将列表保存为 dataframe。

UPDATED as per the comment根据评论更新

Check if this works检查这是否有效

###imports
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd

#driver path
driver = webdriver.Chrome()

#init + search
driver.get("https://pc.bazos.sk/pc/")
time.sleep(1)
nazov = driver.find_element_by_name("hledat")
nazov.send_keys("xeon")
cenamin = driver.find_element_by_name("cenaod")
cenamin.send_keys("")
cenamax = driver.find_element_by_name("cenado")
cenamax.send_keys("300")
driver.find_element_by_name("Submit").click()

##cookie acceptor
time.sleep(10)
driver.find_element_by_xpath("/html/body/div[1]/button").click()

##main
x = 3
d = {}
for i in range(x):
    try:
        main = WebDriverWait(driver, 7).until(
            EC.presence_of_element_located(
                (By.XPATH, "/html/body/div[1]/table/tbody/tr/td[2]")))

        ##find listings in table
        inzeraty = main.find_elements_by_class_name("vypis")
        nadpist_l = []
        for vypis in inzeraty:
            nadpis = vypis.find_element_by_class_name("nadpis")
            ##print listings to check correctness
            nadpist = nadpis.text
            nadpist_l.append(nadpist)
            # print(nadpist)

        ##find the price and print
        cenat_l = []
        for vypis in inzeraty:
            cena = vypis.find_element_by_class_name("cena")
            cenat = cena.text
            cenat_l.append(cenat)
        print(len(cenat_l))

        ##export to csv - not working
        d.update({"Nazov": [nadpist_l] * 20 * x, "Cena": [cenat_l] * 20 * x})

        ##next page
        dalsia = driver.find_element_by_link_text("Ďalšia")
        dalsia.click()
    except:
        driver.quit()

time.sleep(1)
print("Writing to csv")
df = pd.DataFrame(data=d)
df.to_csv("bobo.csv")
time.sleep(1)
print("Writing to csv done !")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM