簡體   English   中英

BeautifulSoup:find_all() 返回一個空列表

[英]BeautifulSoup: find_all() returns an empty list

在檢查了我正在抓取的網站的源頁面后,我想我無法獲取我想要的內容的原因是源頁面中沒有 div 元素。 我還嘗試使用 css 選擇器(在另一個問題BeautifulSoup:Why.select 方法返回一個空列表? ),但這也不起作用。 這是我的一些代碼:

# Scraping top products sales and name from the Recommendation page

from selenium import webdriver
from bs4 import BeautifulSoup as bs
import json
import requests
import numpy as np
import pandas as pd

headers = {
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/84.0.4147.89 Safari/537.36',
    'cookie': '_gcl_au=1.1.961206468.1594951946; _med=refer; _fbp=fb.2.1594951949275.1940955365; SPC_IA=-1; SPC_F=y1evilme0ImdfEmNWEc08bul3d8toc33; REC_T_ID=fab983c8-c7d2-11ea-a977-ccbbfe23657a; SPC_SI=uv1y64sfvhx3w6dir503ixw89ve2ixt4; _gid=GA1.3.413262278.1594951963; SPC_U=286107140; SPC_EC=GwoQmu7TiknULYXKODlEi5vEgjawyqNcpIWQjoxjQEW2yJ3H/jsB1Pw9iCgGRGYFfAkT/Ej00ruDcf7DHjg4eNGWbCG+0uXcKb7bqLDcn+A2hEl1XMtj1FCCIES7k17xoVdYW1tGg0qaXnSz0/Uf3iaEIIk7Q9rqsnT+COWVg8Y=; csrftoken=5MdKKnZH5boQXpaAza1kOVLRFBjx1eij; welcomePkgShown=true; _ga=GA1.1.1693450966.1594951955; _dc_gtm_UA-61904553-8=1; REC_MD_30_2002454304=1595153616; _ga_SW6D8G0HXK=GS1.1.1595152099.14.1.1595153019.0; REC_MD_41_1000044=1595153318_0_50_0_49; SPC_R_T_ID="Am9bCo3cc3Jno2mV5RDkLJIVsbIWEDTC6ezJknXdVVRfxlQRoGDcya57fIQsioFKZWhP8/9PAGhldR0L/efzcrKONe62GAzvsztkZHfAl0I="; SPC_T_IV="IETR5YkWloW3OcKf80c6RQ=="; SPC_R_T_IV="IETR5YkWloW3OcKf80c6RQ=="; SPC_T_ID="Am9bCo3cc3Jno2mV5RDkLJIVsbIWEDTC6ezJknXdVVRfxlQRoGDcya57fIQsioFKZWhP8/9PAGhldR0L/efzcrKONe62GAzvsztkZHfAl0I="'
}
shopee_url = 'https://shopee.co.id/top_products'

navi_info = requests.get('https://shopee.co.id/api/v4/recommend/recommend?bundle=top_sold_product_microsite&limit=20&offset=0')
# extracts all the "index" data from all "sections"
index_arrays = [object_['index'] for object_ in navi_info.json()['data']['sections']]
index_array = index_arrays[0] # only one section with "index" key is present
# extract all catIDs from the "index" payload
catIDs = [object_['key'] for object_ in index_array]
params = {'catID': catIDs}
print(params)

# a = requests.get(link, headers=headers)
response = requests.get('https://shopee.co.id/top_products', params=params)
print(response.text)
soup = bs(response.text, 'html.parser')
products = soup.find_all('div', attrs={'class': '_3S8sOC _2QfAXF'})
print(products) # Why this returns an empty list? 
for product in products:
    name = product.select_one('#main > div > div.shopee-page-wrapper > div.page-product > div.container > div.product-briefing.flex.card._2cRTS4 > div.flex.flex-auto.k-mj2F > div > div.qaNIZv > span')
    sales = product.select_one('#main > div > div.shopee-page-wrapper > div.page-product > div.container > div.product-briefing.flex.card._2cRTS4 > div.flex.flex-auto.k-mj2F > div > div.flex._32fuIU > div.flex.SbDIui > div._22sp0A')
    print(name)
    print(sales)

您在find_all()列表中沒有任何項目,因為在頁面 (HTML) 中沒有帶有該 class: '_3S8sOC _2QfAXF'標簽

您可以使用此 class 輕松檢查所有元素:

import requests
from bs4 import BeautifulSoup as bs

response = requests.get('https://shopee.co.id/top_products').content
soup = bs(response,"html.parser")
products = soup.find_all(class_ = "_3S8sOC _2QfAXF")

請注意,我正在抓取content屬性,並在find.all()方法中使用class_ kwarg,以獲取具有此特定 class 的所有元素。

Unfortanaly 沒有元素。 ://

print(products)

回報:

[]

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM