如何在h4內提取文字強？

Question

我試圖從每個產品頁面中提取每個“總體評級”（強標簽中的數字值） https://www.guitarguitar.co.uk/product/12082017334688--ephonehone-les-paul-standard-plus-top- pro-translucent-blue結構如下：

  <div class="col-sm-12"> 
   <h2 class="line-bottom"> Customer Reviews</h2>
   <h4>
   Overall Rating
   <strong>5</strong>
   <span></span>
  </h4>
  </div>

我試圖只提取強大的價值觀。

 productsRating = soup.find("div", {"class": "col-sm-12"}.h4

這有時會起作用，但頁面會為不同的元素使用相同的類，因此它會提取不需要的html元素。

有沒有解決方案只能獲得產品的整體評論？

EDITED！

這是我程序的整個循環。

for page in range(1, 2):
    guitarPage = requests.get('https://www.guitarguitar.co.uk/guitars/electric/page-{}'.format(page)).text
    soup = BeautifulSoup(guitarPage, 'lxml')
    guitars = soup.find_all(class_='col-xs-6 col-sm-4 col-md-4 col-lg-3')

    for guitar in guitars:

        title_text = guitar.h3.text.strip()
        print('Guitar Name: ', title_text)
        price = guitar.find(class_='price bold small').text.strip()
        trim = re.compile(r'[^\d.,]+')
        int_price = trim.sub('', price)
        print('Guitar Price: ', int_price)

        priceSave = guitar.find('span', {'class': 'price save'})
        if priceSave is not None:
            priceOf = priceSave.text
            trim = re.compile(r'[^\d.,]+')
            int_priceOff = trim.sub('', priceOf)
            print('Save: ', int_priceOff)
        else:
            print("No discount!")

        image = guitar.img.get('src')
        print('Guitar Image: ', image)

        productLink = guitar.find('a').get('href')
        linkProd = url + productLink
        print('Link of product', linkProd)
        productsPage.append(linkProd)

        for products in productsPage:
            response = requests.get(products)
            soup = BeautifulSoup(response.content, "lxml")
            productsDetails = soup.find("div", {"class": "description-preview"})
            if productsDetails is not None:
                description = productsDetails.text
                print('product detail: ', description)
            else:
                print('none')
            time.sleep(0.2)
            productsRating = soup.find_all('strong')[0].text
            print(productsRating)

Answer 1

嘗試：

import requests
from bs4 import BeautifulSoup 

url = 'https://www.guitarguitar.co.uk/product/190319340849008--gibson-les-paul-standard-60s-iced-tea'

html = requests.get(url).text

soup = BeautifulSoup(html, "lxml")
try:
    productsRating = soup.find('h2', string=lambda s: "Customer reviews" in s).find_next_siblings()[0].find('strong').text
except:
    productsRating = None

print(productsRating)

Answer 2

審閱信息全部在腳本標記中，您可以使用json提取和加載。 簡單地看看如何在循環中適應它。

import requests
from bs4 import BeautifulSoup as bs
import json

url = 'https://www.guitarguitar.co.uk/product/12082017334688--epiphone-les-paul-standard-plus-top-pro-translucent-blue'
r = requests.get(url)
soup = bs(r.content, 'lxml')
script = soup.select_one('[type="application/ld+json"]').text
data = json.loads(script.strip())
overall_rating = data['@graph'][2]['aggregateRating']['ratingValue']
reviews = [review for review in data['@graph'][2]['review']] #extract what you want

輸出：

探索json

要處理沒有評論，您可以使用簡單的try except ：

import requests
from bs4 import BeautifulSoup as bs
import json

url = 'https://www.guitarguitar.co.uk/product/190319340849008--gibson-les-paul-standard-60s-iced-tea'
r = requests.get(url)
soup = bs(r.content, 'lxml')
script = soup.select_one('[type="application/ld+json"]').text
data = json.loads(script.strip())
try:
    overall_rating = data['@graph'][2]['aggregateRating']['ratingValue']
    reviews = [review for review in data['@graph'][2]['review']] #extract what you want
except: #you might want to use except KeyError
    overall_rating = "None"
    reviews = ['None']

或者，使用if語句：

if 'aggregateRating' in script:
    overall_rating = data['@graph'][2]['aggregateRating']['ratingValue']
    reviews = [review for review in data['@graph'][2]['review']] #extract what you want
else:
    overall_rating = "None"
    reviews = ['None']

如何在h4內提取文字強？

問題描述

2 個解決方案

解決方案1
0 2019-04-22 21:24:15

解決方案2
0 已采納 2019-04-22 21:58:06

如何在h4內提取文字強？

問題描述

2 個解決方案

解決方案1 0 2019-04-22 21:24:15

解決方案2 0 已采納 2019-04-22 21:58:06

解決方案1
0 2019-04-22 21:24:15

解決方案2
0 已采納 2019-04-22 21:58:06