簡體   English   中英

Python selenium web 抓取數據到 Z628CB5675FF524F3E719B7AA2E88FE3

[英]Python selenium web scraped data to csv export

所以我正在為任何類型的電子商務網站開發一個定制的 web 刮板,我希望它在網站上抓取列表的名稱和價格,然后將它們導出到 csv,但問題是它只導出一行(名稱,價格) 並且它打印在 csv 的每一行上,我找不到一個好的解決方案,我希望我不要問一個非常愚蠢的事情,雖然我認為修復很容易。 我希望有人會閱讀我的代碼並幫助我,謝謝!

###imports
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import csv
import pandas as pd


#driver path
driver = webdriver.Firefox(executable_path="D:\Programy\geckoDriver\geckodriver.exe")

#init + search
driver.get("https://pc.bazos.sk/pc/")
time.sleep(1)
nazov = driver.find_element_by_name("hledat")
nazov.send_keys("xeon")
cenamin = driver.find_element_by_name("cenaod")
cenamin.send_keys("")
cenamax = driver.find_element_by_name("cenado")
cenamax.send_keys("300")
driver.find_element_by_name("Submit").click()

##cookie acceptor
driver.find_element_by_xpath("/html/body/div[1]/button").click()

##main
x = 3
for i in range(x):
    try:
        main = WebDriverWait(driver, 7).until(
            EC.presence_of_element_located((By.XPATH, "/html/body/div[1]/table/tbody/tr/td[2]"))
        )
        
        ##find listings in table
        inzeraty = main.find_elements_by_class_name("vypis")
        for vypis in inzeraty:
            nadpis = vypis.find_element_by_class_name("nadpis")    
            ##print listings to check correctness
            nadpist = nadpis.text
            print(nadpist)
        
        ##find the price and print 
        for vypis in inzeraty:
            cena = vypis.find_element_by_class_name("cena")
            cenat = cena.text
            print(cenat)
        
        ##export to csv - not working
        time.sleep(1)
        print("Writing to csv")
        d = {"Nazov": [nadpist]*20*x,"Cena": [cenat]*20*x}
        df = pd.DataFrame(data=d)
        df.to_csv("bobo.csv")
        time.sleep(1)
        print("Writing to csv done !")
        
        ##next page
        dalsia = driver.find_element_by_link_text("Ďalšia")
        dalsia.click()
    except:
        driver.quit()

我希望 csv 看起來像:

  1. 名稱、價格
  2. name2, price2 會很棒的是 csv 只有兩列和 x 行,具體取決於列表的數量
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd

#driver path
driver = webdriver.Chrome()

#init + search
driver.get("https://pc.bazos.sk/pc/")
time.sleep(1)
nazov = driver.find_element_by_name("hledat")
nazov.send_keys("xeon")
cenamin = driver.find_element_by_name("cenaod")
cenamin.send_keys("")
cenamax = driver.find_element_by_name("cenado")
cenamax.send_keys("300")
driver.find_element_by_name("Submit").click()

##cookie acceptor
time.sleep(10)
driver.find_element_by_xpath("/html/body/div[1]/button").click()

##main
x = 3
d = []
for i in range(x):
    try:
        main = WebDriverWait(driver, 7).until(
            EC.presence_of_element_located(
                (By.XPATH, "/html/body/div[1]/table/tbody/tr/td[2]")))

        ##find listings in table
        inzeraty = main.find_elements_by_class_name("vypis")
        for vypis in inzeraty:
            d.append({"Nazov": vypis.find_element_by_class_name("nadpis").text,
            "Cena": vypis.find_element_by_class_name("cena").text
                })

        ##next page
        dalsia = driver.find_element_by_link_text("Ďalšia")
        dalsia.click()
    except:
        driver.quit()

time.sleep(1)
print("Writing to csv")
df = pd.DataFrame(data=d)
df.to_csv("bobo.csv",index=False)

這給了我 59 件商品的價格。 首先添加到 dict 然后列表,然后將其發送到 pandas。

您需要做的就是創建兩個空列表nadpist_lcenat_l和 append 數據到該列表,最后將列表保存為 dataframe。

根據評論更新

檢查這是否有效

###imports
from selenium.webdriver.common.keys import Keys
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import pandas as pd

#driver path
driver = webdriver.Chrome()

#init + search
driver.get("https://pc.bazos.sk/pc/")
time.sleep(1)
nazov = driver.find_element_by_name("hledat")
nazov.send_keys("xeon")
cenamin = driver.find_element_by_name("cenaod")
cenamin.send_keys("")
cenamax = driver.find_element_by_name("cenado")
cenamax.send_keys("300")
driver.find_element_by_name("Submit").click()

##cookie acceptor
time.sleep(10)
driver.find_element_by_xpath("/html/body/div[1]/button").click()

##main
x = 3
d = {}
for i in range(x):
    try:
        main = WebDriverWait(driver, 7).until(
            EC.presence_of_element_located(
                (By.XPATH, "/html/body/div[1]/table/tbody/tr/td[2]")))

        ##find listings in table
        inzeraty = main.find_elements_by_class_name("vypis")
        nadpist_l = []
        for vypis in inzeraty:
            nadpis = vypis.find_element_by_class_name("nadpis")
            ##print listings to check correctness
            nadpist = nadpis.text
            nadpist_l.append(nadpist)
            # print(nadpist)

        ##find the price and print
        cenat_l = []
        for vypis in inzeraty:
            cena = vypis.find_element_by_class_name("cena")
            cenat = cena.text
            cenat_l.append(cenat)
        print(len(cenat_l))

        ##export to csv - not working
        d.update({"Nazov": [nadpist_l] * 20 * x, "Cena": [cenat_l] * 20 * x})

        ##next page
        dalsia = driver.find_element_by_link_text("Ďalšia")
        dalsia.click()
    except:
        driver.quit()

time.sleep(1)
print("Writing to csv")
df = pd.DataFrame(data=d)
df.to_csv("bobo.csv")
time.sleep(1)
print("Writing to csv done !")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM