简体   繁体   English

Python3,Scrape 返回 []

[英]Python3, Scrape returned []

I have a python script for the scraping site, I need to scrape a few info(name, price, links, ID) this info will be put in MongoDB.我有一个用于抓取站点的 python 脚本,我需要抓取一些信息(名称、价格、链接、ID),这些信息将放在 MongoDB 中。 But I have a problem with my scraping function.但是我的抓取 function 有问题。 It returns to me a blank list.它返回给我一个空白列表。

Can you help me please with this?你能帮我解决这个问题吗? Sorry for my English and thank you in advance对不起我的英语,提前谢谢你

from bs4 import BeautifulSoup
import requests
import time
import pymongo
import difflib
import functools

URL = 'https://www.nickollsandperks.co.uk/New-and-Special-Offers/New-Whisky?order=relevance:asc'
headers = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/39.0.2171.95 Safari/537.36'}


def newScrapItems():
    # global productNames,Names, links, Price, products_info_final
    content3 = requests.get(URL, headers=headers)
    soup3 = BeautifulSoup(content3.text, 'html.parser')
    itemList = []


    for products in soup3.find_all("div", {"class": 'facets-facet-browse-items'}):
        for products_info in products.find_all("div", {"class": "facets-items-collection-view-row"}):
            for products_info_final in products_info.find_all("div", {"class": "facets-item-cell-list"}):
                for generalInfo in products_info_final.find_all("div", {"class": "facets-item-cell-list-right"}):
                    for links in generalInfo.find_all("meta"):
                        for itemNamesNext in products_info.find_all("div", {"class": "item-title-description"}):
                            for prePrice in generalInfo.find_all("div", {"class": "item-button"}):
                                for Names in itemNamesNext.find_all("span"):
                                    for priceInfo in prePrice.find_all("div", {"class": "ProductViewsPrice.Price"}):
                                        for Price in priceInfo.find_all("span", {"class": "product-views-price-lead"}):
                                            productNames = {}

                                            productNames['price'] = Price.get_text()
                                            productNames['name'] = Names.get_text()
                                            productNames['link'] = links['content']
                                            productNames['ID'] = products_info_final['data-item-id']
                                            itemList.append(productNames)


    return itemList
newItems = newScrapItems()
print(newScrapItems())

Its returned:它返回:

[]

Process finished with exit code 0

I try to search for this trouble but this is didn't give me results.我试图寻找这个麻烦,但这并没有给我结果。 I really hope that some kind person will be able to help me with this problem since I have been struggling with this for a couple of days.我真的希望某个好心人能够帮助我解决这个问题,因为我已经为此苦苦挣扎了几天。

I realized it like a string in another site, but I had less nesting:我意识到它就像另一个站点中的字符串,但我的嵌套更少:

content3 = requests.get(URL, headers=headers)
    soup3 = BeautifulSoup(content3.text, 'html.parser')
    newItemList = []
    for products in soup3.find_all("li", {"class": 'product-item'}):
        for products_info in products.find_all("strong", {"class": "product-item-name"}):
            for name in products_info.find_all("a"):
                for productsPriceInfo in products.find_all("div", {"class": "price-box price-final_price"}):
                    for productsPriceInfoAdv in productsPriceInfo.find_all("span", {
                        "class": "price-wrapper price-including-tax"}):
                        for finalPrice in productsPriceInfoAdv.find_all("span", {"class": "price"}):
                            productNames = {}
                    productNames['name'] = name['title']
                    productNames['price'] = finalPrice.get_text()
                    productNames['link'] = name['href']
                    productNames['ID'] = productsPriceInfo['data-product-id']
                    # dict = {'names': name['title']}
                    newItemList.append(productNames)

    return newItemList

this returned string with format - name: "name", link:"link" etc.此返回的字符串格式为 - 名称:“名称”、链接:“链接”等。

Re-write to use a single loop of a parent element collection.重写以使用父元素集合的单个循环。 Choose a parent that encompasses all the data you want as blocks;选择一个包含您想要的所有数据作为块的父级; 1 block per product;每个产品 1 个区块; then, select relationally from that block within the loop to get the items within each block.然后,select 从循环内的该块中相关地获取每个块内的项目。

I have assumed id is product SKU.我假设 id 是产品 SKU。

import requests
import numpy as np
from bs4 import BeautifulSoup as bs

newItemList = []
base = 'https://www.nickollsandperks.co.uk/'

r = requests.get('https://www.nickollsandperks.co.uk/New-and-Special-Offers/New-Whisky?order=relevance:asc')
soup = bs(r.content, 'lxml')

for listing in soup.select('.facets-items-collection-view-cell-span12'):
    price = listing.select_one('.product-views-price') #price
    if not price is None: #not every listing has a price
        price = price.text
    else:
        price = np.NaN
        
    newItemList.append(
       {'name': listing.select_one('.facets-item-cell-list-name span').text,
        'link': base + listing.select_one('.facets-item-cell-list-name')['href'],
        'price': price, 
        'id': listing.select_one('.facets-item-cell-list')['data-sku']
       }
    )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM