如何使用 Beautiful Soup 从网站获取值和项目名称

Question

Bit of a basic bs4 question here, but been trying for hours!这里有一个基本的 bs4 问题，但已经尝试了几个小时！

url = 'https://www.currys.co.uk/gbuk/search-keywords/xx_xx_xx_xx_xx/acer/xx-criteria.html'
r = urllib.request.urlopen(url).read()
soup = BeautifulSoup(r,'lxml')

price = soup.find_all("div", class_="productPrices")

How do I now go about consuming the price?我现在如何去消费价格？ In this case, that is the "strong class="price" data-product="price"> tag.在这种情况下，就是“strong class="price" data-product="price"> 标签。

I also want to be able to consume the product SKU: "productSKU":"200341"我还希望能够消费产品 SKU："productSKU":"200341"

I'd like to be able to loop through all pages matching my search (in this case, just "acer") and store as a dataframe ALL skus and prices matching that search.我希望能够遍历与我的搜索匹配的所有页面（在这种情况下，只是“acer”）并将与该搜索匹配的所有 skus 和价格存储为数据框。

Answer 1

You can try this:你可以试试这个：

import requests
import re
from collections import namedtuple
product = namedtuple('product', ['name', 'price', 'sku'])
from bs4 import BeautifulSoup as soup
page_data = str(requests.get('https://www.currys.co.uk/gbuk/search-keywords/xx_xx_xx_xx_xx/acer/xx-criteria.html').text)
names = [i.text for i in soup(page_data, 'html.parser').find_all('span', {'data-product':'name'})]
prices = list(map(lambda x:re.sub('[\s\n]+', '', x), [i.text for i in soup(page_data, 'html.parser').find_all('strong', {'data-product':'price'})]))
skus = dict([(b[:-1], a) for a, b in re.findall('"productSKU":"(.*?)","productName":"(.*?)"', page_data)])
final_product_data = [product(a, b, int([h for c, h in skus.items() if c in a][0])) for a, b in zip(names, list(prices))]
print([(i.name, i.price, i.sku) for i in final_product_data])

Output:输出：

[('KG221Q Full HD 21.5" LED Monitor - Black', '£99.99', '201795'), ('C22-760 21.5" All-in-One PC - Silver', '£399.97', '200341'), ('KG271 Full HD 27" LED Gaming Monitor - Black', '£179.99', '201797'), ('S242HLDBID Full HD 24" LED Monitor', '£119.99', '156512'), ('CB3-431 14" Full HD Chromebook - Silver', '£299.99', '169493'), ('CB3-431 14" Full HD Chromebook - Gold', '£299.99', '169493'), ('Iconia One 10 B3-A40 10.1" Tablet - 16 GB, White', '£139.99', '214589'), ('14 CB3-431 Chromebook - Silver', '£249.99', '183981'), ('ED242QRwi Full HD 24" Curved LCD Monitor - White', '£119.99', '224620'), ('Aspire E15 15.6" Laptop - Black', '£699.99', '204284'), ('11 CB3-131 Chromebook - White', '£199.97', '165016'), ('R241Ybmid Full HD 23.8" LED Monitor', '£134.99', '164002'), ('CB3-131 11.6" Chromebook - Blue', '£199.99', '214340'), ('15 CB3-532 Full HD Chromebook - Iron', '£279.99', '191983'), ('Chromebook R 13 CB5-312T 2-in-1 - Silver', '£399.99', '180082'), ('14 CB3-431 Chromebook - Gold', '£249.99', '183980'), ('Iconia One 10 B3-A40 10.1" Tablet - 32 GB, Black', '£149.99', '214589'), ('Swift 3 SF314-52 14" Laptop - Silver', '£649.99', '205493'), ('C24-760 23.8" All-in-One PC - Silver', '£599.99', '200448'), ('Chromebook R 11 CB5-132T 2-in-1 - White', '£279.99', '183985')]

Now, your data is stored as a list of namedtuple objects for ease of access.现在，您的数据存储为一个namedtuple对象列表，以便于访问。

如何使用 Beautiful Soup 从网站获取值和项目名称

问题描述

1 个解决方案

解决方案1
1 2018-02-04 20:09:37

如何使用 Beautiful Soup 从网站获取值和项目名称

问题描述

1 个解决方案

解决方案1 1 2018-02-04 20:09:37

解决方案1
1 2018-02-04 20:09:37