简体   繁体   English

Selenium 只打印一张 output

[英]Selenium only prints one output

I am trying to scrape an ecommerce page... when I try and use selenium to scrape the titles, I only get one output (you can also provide alternative ways to scrape it with BS4)我正在尝试抓取电子商务页面...当我尝试使用 selenium 来抓取标题时,我只得到一个 output(您还可以提供其他方法来使用 BS4 抓取它)

my code..我的代码..

from selenium import webdriver
from selenium.webdriver.common.keys import Keys
import pandas as pd  
from bs4 import BeautifulSoup
import requests

PATH = "C:\Program Files (x86)\chromedriver.exe"
SRC = requests.get("https://egypt.souq.com").text
soup = BeautifulSoup(SRC, 'lxml')
driver = webdriver.Chrome(PATH)
driver.get("https://egypt.souq.com")

dotd = "/html/body/div[2]/div/main/div[1]/div[1]/div/div[1]/a/img"

driver.find_element_by_xpath(dotd).click()

def get_deals():
    title_xpath = "/html/body/div[1]/div/main/div/div[4]/div[3]/div[2]/div[1]/div[1]/div/div[2]/ul/li[1]/h6/span/a"
    titles = driver.find_elements_by_xpath(title_xpath)
    for title in titles:
        print(title.text)

get_deals()
print("successful")

the part I want to scrape..我想刮的部分..

<div class="columns small-8 medium-12">
    <ul class="body no-bullet">
        <li class="title-row">
            <h6 class="title">
                <span  class="itemTitle">
                    <a href="https://egypt.souq.com/eg-en/samsung-galaxy-m11-dual-sim-32gb-3gb-ram-4g-lte-metallic-blue-85271900033/u/" title="Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Metallic Blue">
                        Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Metallic Blue
                    </a>
                </span>
            </h6>
        </li>
        <li class="coupon-flag-row">
        </li>

        <li>

my output..我的 output..

Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Metallic Blue三星 Galaxy M11 双 SIM - 32GB、3GB RAM、4G LTE - 金属蓝

successful成功的

the page I am scraping..我正在抓取的页面..

https://deals.souq.com/eg-en/?utm_source=souq https://deals.souq.com/eg-en/?utm_source=souq

please help请帮忙

You can do it like this:你可以这样做:

from bs4 import BeautifulSoup
import requests

response = requests.get(URL)
response = respnose.text
soup = BeautifulSoup(response, "lxml")

all_titles = soup.findAll("span", class_ = "itemTitle")
for title in all_titles:
    title = title.find("a")
    title = title.get("title")
    print(title)

For this code to run you will have to install lxml you can do it by typing pip install lxml in cmd .要运行此代码,您必须安装lxml ,您可以通过在cmd中键入pip install lxml来完成。

To get all the titles from an webpage you need to Induce WebDriverWait () and wait for visibility_of_all_elements_located () and following css selector.要从网页中获取所有标题,您需要 Induce WebDriverWait () 并等待visibility_of_all_elements_located () 并遵循css选择器。

titles = WebDriverWait(driver, 10).until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h6.title>span.itemTitle>a")))
for title in titles:
    print(title.text)

You need to import following libraries.您需要导入以下库。

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By

Console Output:控制台 Output:

Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Metallic Blue
Electrostar HW50101 Electric Water Heater -50 Liter, White
PANTENE Anti Hair Fall Shampoo, 400 ml with Anti Hair Fall Oil Replacement, 180 ml and 3 Minute Miracle Daily Care Conditioner and Mask, 200 ml
SHARP SJ-GV63G-RD Inverter Refrigerator with Hoover DXOA38AC3R-ELA Washing Machine, La Germania 9M10Gub1X4Aww Cooker, Toshiba 4K Smart 55 Inch TV - 55U5965EA, TOSHIBA VC-EA1800SE Vacuum Cleaner, Tornado FP-1000SG Food Processor, Tornado TCM-11415-B Espresso Machine and Tornado EFS-360/903G Stand Fan - 16 Inch
Panasonic ER217 Hair and Beard Trimmer Wet & Dry
PANTENE Smooth and Silky Shampoo, 400 ml with Smooth and Silky Oil Replacement, 180 ml and 3 Minute Miracle Smooth and Silky Conditioner and Mask, 200 ml
Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Black
Apple iPhone 11 Pro Max with FaceTime - 256GB, 4GB RAM, 4G LTE, Midnight Green, Dual SIM
Sharp SJ-BG615-SS Advanced No Frost Digital Refrigerator with Bottom Freezer and Two Doors, 468 Liters - Silver with SHARP R-20CR(S) Microwave, 20 Liters, 800 Watt - Silver
Apple iPad 2019 7th Gen - 10.2 inch Retina Display, Wi-Fi, 32GB, Gold
Pampers Sensitive Protect, 56 Wipes
Hoover DXOA38AC3R-ELA Front Loading Full Automatic Washing Machine, 8 Kg with Tornado TST-2200 Steam Iron, 2200 Watt
Gillette Fusion ProGlide Power Styler Razor
ATA 32 Inch HD LED Standard TV Black - 32DN4 LE
Apple iPhone SE - 128GB , 3GB RAM, 4G LTE, White - Single SIM and E-SIM
Samsung Galaxy M11 Dual SIM - 32GB, 3GB RAM, 4G LTE - Violet
Pampers Fresh Clean, 64 Wipes
Mintra Plastic Round Pot, 11cm- Black
LG F4R5VYG2E Vivace LED Display Steel Washing Machine, 9 kg - Black
Casio MTP-V001L-7BUDF Analog Leather Dress Watch for Men - Black, Quartz
Oral-B Gum and Enamel Care Ultrathin Extra Soft Toothbrush, 2 Pieces -Multi Color
Apple Iphone XS Max With Facetime - 64 GB, 4G LTE, Gold, 4 GB Ram, Single Sim & E-Sim
LG F4R5VGG2E Steam Washing Machine with Dryer, 9 Kilograms - Black Steel
Pampers Pants Diapers, Size 5, Junior, 12-18 kg, 52 Count
Toshiba GR-EF51GZ-XK Refrigerator with HOOVER DXOA38AC3R-ELA Full Automatic Washing Machine with La Germania 9M10G4A1X4AWW Cooker with Tornado 43EL8250E-B Shield 43 Inch TV with TOSHIBA VC-EA1600SE Vacuum Cleaner with Tornado MOM-C25BBE-S Microwave with Grill and Tornado EFS-360/90R Stand Fan
Braun Face Extra Sensitive Replacement Brush Refill , Duo Pack , 80-s Face
Apple iPhone SE - 64GB, 3GB RAM, 4G LTE, Red - Single SIM and E-SIM
Off Cliff Raglan Sleeves Top with Elastic-Waist Shorts Cotton Pajama Set for Men - Heather Grey & Heather White
Sharp SJ-58C(CH) Refrigerator with HOOVER DXOA38AC3R-ELA Full Automatic Washing Machine with La Germania 9M10Gub1X4Aww Cooker with Tornado 43EL8250E-B Shield TV with TOSHIBA VC-EA1600SE Vacuum Cleaner and Tornado EFS-360/90R Stand Fan
Nilco Tottery Tower Wooden Blocks

If you would like to use requests module try this code you will get the same output.如果您想使用请求模块,请尝试使用此代码,您将获得相同的 output。

import requests
from bs4 import BeautifulSoup

res=requests.get("https://deals.souq.com/eg-en/?utm_source=souq")
soup=BeautifulSoup(res.text,"html.parser")
for item in soup.select('.title>.itemTitle>a'):
    print(item.text.strip())

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM