簡體   English   中英

使用 Python3.9 和 BeautifulSoup 4 進行 JSONDecodeError 網頁抓取

[英]JSONDecodeError webscraping with Python3.9 and BeautifulSoup 4

我正在嘗試抓取給定品牌的一些 TrustPilot 評論 - 這是我的代碼:

import requests
from bs4 import BeautifulSoup
import time
import json

headers = {"User-Agent":"Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:47.0) Gecko/20100101 Firefox/47.0"}
#def get_total_items(url):

#soup = BeautifulSoup(requests.get(url, format(0),headers).text, 'lxml')
stars = []
dates = []
results = []
with requests.Session() as s:
    for num in range(1,2):
        url = "https://www.trustpilot.com/review/www.hiwaldo.com?page={}".format(num)
        r = s.get(url, headers = headers)
        soup = BeautifulSoup(r.content, 'lxml')

        for star in soup.find_all("section", {"class":"review__content"}):

            # Get rating value
            rating = star.find("div", {"class":"star-rating star-rating--medium"}).find('img').get('alt')

            # Get date value
            date_json = json.loads(star.find('script').text)
            date = date_json['publishedDate']

            stars.append(rating)
            dates.append(date)

            data = {"Rating": rating, "Date": date}
            results.append(data)

        time.sleep(2)


print(results)

當我運行python3 ~/Desktop/reviews.py時,我遇到以下錯誤消息:

Traceback (most recent call last):
      File "/Users/user/Desktop/reviews.py", line 25, in <module>
        date_json = json.loads(star.find('script').text)
      File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/__init__.py", line 346, in loads
        return _default_decoder.decode(s)
      File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 337, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/json/decoder.py", line 355, in raw_decode
        raise JSONDecodeError("Expecting value", s, err.value) from None
    json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

這個設置有什么明顯的錯誤嗎? 我是一個完整的 python 新手,以防這也不明顯。

提前謝謝了!

要從star中提取 JSON 數據,請使用.string方法而不是.text

所以而不是:

date_json = json.loads(star.find('script').text)

利用:

date_json = json.loads(star.find('script').string)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM