简体   繁体   English

ValueError: Expected object or value <-> Can't load a json file to pandas dataframe, or convert to csv, either will suffice

[英]ValueError: Expected object or value <-> Can't load a json file to pandas dataframe, or convert to csv, either will suffice

I have an approx 1.5 GB sized JSON file that I need to use as a dataframe, and I've tried my all out for 10 hours straight to get it load as a dataframe, going through all possible answered questions on StackOverflow too, As a second option I tried to convert it to csv and then load it as dataframe, but that also fails, as well as in previously answered questions people just explained the error rather than giving the code: here is how the json looks like:我有一个大约1.5 GB大小的JSON文件,我需要用作Z6A8064B5DF479455555555555555555555555555555555057DZ,我将所有问题逐渐逐步逐渐加载到Z66647C551571414716A,第二个选项我尝试将其转换为 csv,然后将其加载为 dataframe,但这也失败了,以及在先前回答的问题中,人们只是解释了错误而不是给出代码:这里是 Z466DEEC76ECDF2FCA6D38571F63

{'work': '2505753', 'flags': [], 'unixtime': 1260403200, 'stars': 1.0, 'nhelpful': 0, 'time': 'Dec 10, 2009', 'comment': "I really thought that I would like this book. I'm fascinated by this time period, and the plots to assassinate Hitler have always intrigued me. However, this book was so boring that I had to force myself to read it. The author no doubt has a commanding vocabulary, but his writing style and word choices made the book a chore to read. I've read dry textbooks that had more life to them than this novel. ", 'user': 'schatzi'}
{'work': '12458291', 'flags': [], 'unixtime': 1361664000, 'stars': 4.0, 'nhelpful': 0, 'time': 'Feb 24, 2013', 'comment': "After her father's death, Lena discovers that her father had been keeping many secrets from her. Lena is a member of the. Silenti, telepaths who came to our world through a portal. She must learn to navigate through the social, religious, and political pitfalls of her new life. Who can she trust? What will her role be? I enjoyed this story and the world the author created very much. ", 'user': 'aztwinmom'}

I tried this code as a 2nd option to convert to csv, the error I debugged was of single quote, but replacing "\'" with "\"" in this huge data will take enormous time.我尝试将此代码作为转换为 csv 的第二个选项,我调试的错误是单引号,但是在这个庞大的数据中用"\""替换"\'"将花费大量时间。

Attempt with json尝试使用 json

import json
import csv
import os

f = open('test.json')
data = json.load(f)
f.close()

f = open('data.json')
csv_file = csv.writer(f)
count=0
for item in data:
    f.writerow(item)
    count+=1
    if(count==10):
        break

f.close()

Traceback追溯

---------------------------------------------------------------------------
JSONDecodeError                           Traceback (most recent call last)
<ipython-input-115-d75bae392cae> in <module>
      1 f = open('test.json')
----> 2 data = json.load(f)
      3 f.close()

e:\Anaconda3\lib\json\__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    294         cls=cls, object_hook=object_hook,
    295         parse_float=parse_float, parse_int=parse_int,
--> 296         parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
    297 
    298 

e:\Anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
    346             parse_int is None and parse_float is None and
    347             parse_constant is None and object_pairs_hook is None and not kw):
--> 348         return _default_decoder.decode(s)
    349     if cls is None:
    350         cls = JSONDecoder

e:\Anaconda3\lib\json\decoder.py in decode(self, s, _w)
    335 
    336         """
--> 337         obj, end = self.raw_decode(s, idx=_w(s, 0).end())
    338         end = _w(s, end).end()
    339         if end != len(s):

e:\Anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
    351         """
    352         try:
--> 353             obj, end = self.scan_once(s, idx)
    354         except StopIteration as err:
    355             raise JSONDecodeError("Expecting value", s, err.value) from None

JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
  • pd.read_json('test.json') results in: pd.read_json('test.json')结果:
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-118-771e17311e28> in <module>
----> 1 pd.read_json('test.json')

e:\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
    212                 else:
    213                     kwargs[new_arg_name] = new_arg_value
--> 214             return func(*args, **kwargs)
    215 
    216         return cast(F, wrapper)

e:\Anaconda3\lib\site-packages\pandas\io\json\_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
    606         return json_reader
    607 
--> 608     result = json_reader.read()
    609     if should_close:
    610         filepath_or_buffer.close()

e:\Anaconda3\lib\site-packages\pandas\io\json\_json.py in read(self)
    729             obj = self._get_object_parser(self._combine_lines(data.split("\n")))
    730         else:
--> 731             obj = self._get_object_parser(self.data)
    732         self.close()
    733         return obj

e:\Anaconda3\lib\site-packages\pandas\io\json\_json.py in _get_object_parser(self, json)
    751         obj = None
    752         if typ == "frame":
--> 753             obj = FrameParser(json, **kwargs).parse()
    754 
    755         if typ == "series" or obj is None:

e:\Anaconda3\lib\site-packages\pandas\io\json\_json.py in parse(self)
    855 
    856         else:
--> 857             self._parse_no_numpy()
    858 
    859         if self.obj is None:

e:\Anaconda3\lib\site-packages\pandas\io\json\_json.py in _parse_no_numpy(self)
   1087         if orient == "columns":
   1088             self.obj = DataFrame(
-> 1089                 loads(json, precise_float=self.precise_float), dtype=None
   1090             )
   1091         elif orient == "split":

ValueError: Expected object or value
  • The error very clearly states these aren't JSON formatted, because you have {'work' and JSON would be {"work" , single quotes vs. double quotes.该错误非常清楚地表明这些不是 JSON 格式,因为您有{'work'和 JSON 将是{"work" ,单引号与双引号。
  • Using .replace("'", '"') will not work because the value of 'comment' is properly double quoted ( "..." ), because there are words with an apostrophe (eg "...father's..." ). Using replace, will produce a result like '...father"s...' .使用.replace("'", '"')将不起作用,因为'comment'的值被正确地双引号( "..." ),因为有些单词带有撇号(例如"...father's..." ). 使用替换,将产生类似'...father"s...'结果。
  • You have a file, with rows of dicts .您有一个文件,其中包含dicts行。
  • The file needs to be read in, which will convert each row to a str type需要读入文件,将每一行转换为str类型
  • Use ast.literal_eval to convert each row back to a dict type使用ast.literal_eval将每一行转换回dict类型
  • Read the list of dicts, rows , into a dataframe.将字典列表rows读入 dataframe。
import pandas as pd
from ast import literal_eval
from pathlib import Path

# read file
file = Path('e:/PythonProjects/stack_overflow/test.json')  # path to file
with file.open('r', encoding='utf-8') as f:  # open the file
    rows = [literal_eval(row) for row in f.readlines()]  # list comprehension to convert each row back to a dict

# convert rows to a dataframe
df = pd.DataFrame(rows)

# display(df)
       work flags    unixtime  stars  nhelpful          time                                                                                                                                                                                                                                                                                                                                                                                                              comment       user
0   2505753    []  1260403200    1.0         0  Dec 10, 2009  I really thought that I would like this book. I'm fascinated by this time period, and the plots to assassinate Hitler have always intrigued me. However, this book was so boring that I had to force myself to read it. The author no doubt has a commanding vocabulary, but his writing style and word choices made the book a chore to read. I've read dry textbooks that had more life to them than this novel.     schatzi
1  12458291    []  1361664000    4.0         0  Feb 24, 2013                   After her father's death, Lena discovers that her father had been keeping many secrets from her. Lena is a member of the. Silenti, telepaths who came to our world through a portal. She must learn to navigate through the social, religious, and political pitfalls of her new life. Who can she trust? What will her role be? I enjoyed this story and the world the author created very much.   aztwinmom

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM