简体   繁体   中英

Scraping in multiple classes Selenium

I'm trying to extract text in the following HTML inside the class="a-size-based-plus a-color-base" using selenium webdriver.

刮掉蓝线内的文字

My code structure is the following:

from selenium import webdriver
from selenium.webdriver import Chrome
from selenium.webdriver.common.by import By
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.expected_conditions import presence_of_element_located

import os
import re  # regular expressions, are imported from python directly
import time
import numpy as np
import pandas as pd
from difflib import SequenceMatcher
BASE_DIR = os.path.dirname(os.path.abspath(__file__))

-----HERE is some unrelated code-----

# Find Data
    i = 0
    driver.execute_script("window.scrollTo(0,document.body.scrollHeight)")
    wait = WebDriverWait(driver, 20)
    wait.until(EC.element_to_be_clickable(
        (By.CLASS_NAME, 'xtaqv-root')))
    wait.until(EC.element_to_be_clickable((By.CLASS_NAME, 'extension-rank')))
    wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[data-src="price"]')))
    time.sleep(5)

    for element in driver.find_elements_by_class_name('xtaqv-root'):   
        # Ratio of similarity
        try:
            item_name = element.find_element_by_tag_name("h2").text
            ratio = SequenceMatcher(None, item_name, key).ratio()
        except:
            item_name = np.nan
            ratio = 0
            pass
        try:
            link = element.find_element_by_css_selector('[data-src="price"]')
            href = link.get_attribute('href')
        except:         
            href = np.nan
        try:
            brand = element.find_element_by_css_selector('.a-size-based-plus.a-color-base')
            brand = brand.text
        except:         
            brand = np.nan  

The last try-except in the code is the most important.

From looking at the HTML, I see a typo in your locator, this line:

brand = element.find_element_by_css_selector('.a-size-based-plus.a-color-base')

It should be size-base not size-based , try this:

brand = element.find_element_by_css_selector('.a-size-base-plus.a-color-base')

Hope, this helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM