简体   繁体   中英

How to scrape data using selenium and python, I am trying to extract all the data which is in title div tag

enter image description here

from selenium import webdriver
import pandas as pd
import time
import requests
from selenium.common.exceptions import ElementClickInterceptedException

driver = webdriver.Chrome(executable_path ="D:\\chromedriver_win32\chromedriver.exe")
url = "https://www.fynd.com/brands/"
driver.get(url)
time.sleep(2)
driver.maximize_window()
luxury_brand_names = []
element = driver.find_element_by_css_selector("//div[@class='group-cards']")#.get_attribute("title")
#element = driver.find_elements_by_xpath("//div[@classdata-v-2f624c7c data-v-73869697 title]")
for a in element:
    luxury_brand_names.append()
print(luxury_brand_names)

this is the code I am running and I am not getting any output, please help me with this, I am very new with coding and scraping data. I am trying to get all the data that is in the title div tag.

I think the only things you need are to change your selector, identify with find_elements , and loop through the elements. Also you need to actually pass a value in to append() . It should be

elements = driver.find_elements_by_css_selector("div.card-item")
for element in elements:
    luxury_brand_names.append(element.get_attribute('title'))

first of all your append() is empty, nothing is added to the list

as second - need to change element = driver.find_elements_by_css_selector("//div[@class='card-item']") to be as a list of items, so you can use it in your loop like:

luxury_brand_names.append(a.get_attribute("title")

Here is the answer using Beautiful Soup and selenium together -

from bs4 import BeautifulSoup
from selenium import webdriver


url = "https://www.fynd.com/brands/"
driver = webdriver.Chrome(executable_path ="D:\\chromedriver_win32\chromedriver.exe")
driver.get(url)
soup = BeautifulSoup(driver.page_source,"html.parser")
title = soup.find_all('span',{'class':'ukt-title clrWhite'})
all_titles = list()
for jelly in range(len(title)):
    all_titles.append(title[jelly].text.strip())
    
print(all_titles)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM