簡體   English   中英

我正在嘗試打開 JSON 文件並從 Python (JupyterLab) 中的特定字段中提取數據

[英]I am trying to open JSON file and extracts data from specific fields in Python (JupyterLab)

我有成千上萬的亞馬遜產品評論數據作為 JSON 文件。 我需要處理 python 中的數據,並從字段中提取數據:“reviewText”、“overall”和“summary”

Json 文件如下所示:

{"reviewerID": "A11N155CW1UV02", "asin": "B000H00VBQ", "reviewerName": "AdrianaM", "helpful": [0, 0], "reviewText": "I had big expectations because I love English TV, in particular Investigative and detective stuff but this guy is really boring. It didn't appeal to me at all.", "overall": 2.0, "summary": "A little bit boring for me", "unixReviewTime": 1399075200, "reviewTime": "05 3, 2014"}
{"reviewerID": "A3BC8O2KCL29V2", "asin": "B000H00VBQ", "reviewerName": "Carol T", "helpful": [0, 0], "reviewText": "I highly recommend this series. It is a must for anyone who is yearning to watch \"grown up\" television. Complex characters and plots to keep one totally involved. Thank you Amazin Prime.", "overall": 5.0, "summary": "Excellent Grown Up TV", "unixReviewTime": 1346630400, "reviewTime": "09 3, 2012"}
{"reviewerID": "A60D5HQFOTSOM", "asin": "B000H00VBQ", "reviewerName": "Daniel Cooper \"dancoopermedia\"", "helpful": [0, 1], "reviewText": "This one is a real snoozer. Don't believe anything you read or hear, it's awful. I had no idea what the title means. Neither will you.", "overall": 1.0, "summary": "Way too boring for me", "unixReviewTime": 1381881600, "reviewTime": "10 16, 2013"}

我正在嘗試這個:

import json

with open('Amazon_Instant_Video_5.json') as json_file:
    data = json.load(json_file)
print(data['reviewText']['overal']['summary'])

但它給了我這個錯誤:

JSONDecodeError                           Traceback (most recent call last)
/var/folders/76/9lhw7d657y757vg308n_thww0000gn/T/ipykernel_4272/378691339.py in <module>
      2 
      3 with open('Amazon_Instant_Video_5.json') as json_file:
----> 4     data = json.load(json_file)
      5 print(data['reviewText']['overal']['summary'])

~/opt/anaconda3/lib/python3.9/json/__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    291     kwarg; otherwise ``JSONDecoder`` is used.
    292     """
--> 293     return loads(fp.read(),
    294         cls=cls, object_hook=object_hook,
    295         parse_float=parse_float, parse_int=parse_int,

~/opt/anaconda3/lib/python3.9/json/__init__.py in loads(s, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    344             parse_int is None and parse_float is None and
    345             parse_constant is None and object_pairs_hook is None and not kw):
--> 346         return _default_decoder.decode(s)
    347     if cls is None:
    348         cls = JSONDecoder

~/opt/anaconda3/lib/python3.9/json/decoder.py in decode(self, s, _w)
    338         end = _w(s, end).end()
    339         if end != len(s):
--> 340             raise JSONDecodeError("Extra data", s, end)
    341         return obj
    342 

JSONDecodeError: Extra data: line 2 column 1 (char 394)

這是JSON 行格式。 是一個 JSON 字符串。 一次讀取一行並將其傳遞給json.loads()

import json

with open('Amazon_Instant_Video_5.json') as json_file:
    for line in json_file:
        data = json.loads(line)
        print(data['reviewText'], data['overall'], data['summary'])

“額外數據”是由於json.load()期望整個文件是單個 JSON object 並且在掃描第一行后認為 JSON object 是完整的。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM