简体   繁体   中英

Get information from webpage with the same class names (Python Selenium)

I have a simple question that assumingly can be solved very easily. I however used some time now to extract the four lines of information as shown here:

see html structure here

I first try to access the <ul _ngcontent-xl-byg-c79="" class="short ng-star-inserted" item to then loop over the <li _ngcontent-xl-byg-c79="" class="table-row ng-star-inserted"> items in order to store the embedded information in my dataframe (columns are 'Mærke', 'Produkttype', 'Serie', and 'Model').

What do I do wrong? My problem is that the four lines have the same "class" name, which gives me the same output in all four loops.

This is my code:

from selenium import webdriver
import pandas as pd

# Activate web browser: External control

browser = webdriver.Chrome(r'C:\Users\KristerJens\Downloads\chromedriver_win32\chromedriver')

# Get webpage  

browser.get("https://www.xl-byg.dk/shop/knauf-insulation-ecobatt-murfilt-190-mm-2255993")

# Get information

brand= []
product= []
series=[]
model=[]

for i in browser.find_elements_by_xpath("//ul[@class='short ng-star-inserted']/li"):
    for p in i.find_elements_by_xpath("//span[@class='attribute-name']"):
        brand.append(i.find_elements_by_class_name('?').text)
        product.append(i.find_elements_by_class_name('?').text)
        series.append(i.find_elements_by_class_name('?').text)
        model.append(i.find_elements_by_class_name('?').text)

df = pd.DataFrame()

df['brand'] = brand
df['product'] = product
df['series'] = series
df['model'] = model

Any help is very appreciated!!

Try like below and confirm:

from selenium import webdriver
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

driver = webdriver.Chrome(executable_path="path to chromedriver.exe")
driver.maximize_window()
driver.implicitly_wait(10)

driver.get("https://www.xl-byg.dk/shop/knauf-insulation-ecobatt-murfilt-190-mm-2255993")

wait = WebDriverWait(driver,30)

# Cookie pop-up
wait.until(EC.element_to_be_clickable((By.XPATH,"//button[@aria-label='Accept all' or @aria-label = 'Accepter alle']"))).click()

options = driver.find_elements_by_xpath("//div[@class='row-column']//ul[contains(@class,'short')]/li")
for opt in options:
    attribute = opt.find_element_by_xpath("./span[@class='attribute-name']").text # Use a "." in the xpath to find element within in an element
    value = opt.find_element_by_xpath("./*[contains(@class,'ng-star-inserted')]").text
    print(f"{attribute} : {value}")
Mærke : Knauf Insulation
Produkttype : Murfilt
Serie : ECOBATT
Materiale : Glasmineraluld

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM