[英]ValueError: Expected object or value <-> Can't load a json file to pandas dataframe, or convert to csv, either will suffice
I have an approx 1.5 GB sized JSON file that I need to use as a dataframe, and I've tried my all out for 10 hours straight to get it load as a dataframe, going through all possible answered questions on StackOverflow too, As a second option I tried to convert it to csv and then load it as dataframe, but that also fails, as well as in previously answered questions people just explained the error rather than giving the code: here is how the json looks like:我有一个大约1.5 GB大小的JSON文件,我需要用作Z6A8064B5DF479455555555555555555555555555555555057DZ,我将所有问题逐渐逐步逐渐加载到Z66647C551571414716A,第二个选项我尝试将其转换为 csv,然后将其加载为 dataframe,但这也失败了,以及在先前回答的问题中,人们只是解释了错误而不是给出代码:这里是 Z466DEEC76ECDF2FCA6D38571F63
{'work': '2505753', 'flags': [], 'unixtime': 1260403200, 'stars': 1.0, 'nhelpful': 0, 'time': 'Dec 10, 2009', 'comment': "I really thought that I would like this book. I'm fascinated by this time period, and the plots to assassinate Hitler have always intrigued me. However, this book was so boring that I had to force myself to read it. The author no doubt has a commanding vocabulary, but his writing style and word choices made the book a chore to read. I've read dry textbooks that had more life to them than this novel. ", 'user': 'schatzi'}
{'work': '12458291', 'flags': [], 'unixtime': 1361664000, 'stars': 4.0, 'nhelpful': 0, 'time': 'Feb 24, 2013', 'comment': "After her father's death, Lena discovers that her father had been keeping many secrets from her. Lena is a member of the. Silenti, telepaths who came to our world through a portal. She must learn to navigate through the social, religious, and political pitfalls of her new life. Who can she trust? What will her role be? I enjoyed this story and the world the author created very much. ", 'user': 'aztwinmom'}
I tried this code as a 2nd option to convert to csv, the error I debugged was of single quote, but replacing "\'"
with "\""
in this huge data will take enormous time.我尝试将此代码作为转换为 csv 的第二个选项,我调试的错误是单引号,但是在这个庞大的数据中用
"\""
替换"\'"
将花费大量时间。
import json
import csv
import os
f = open('test.json')
data = json.load(f)
f.close()
f = open('data.json')
csv_file = csv.writer(f)
count=0
for item in data:
f.writerow(item)
count+=1
if(count==10):
break
f.close()
---------------------------------------------------------------------------
JSONDecodeError Traceback (most recent call last)
<ipython-input-115-d75bae392cae> in <module>
1 f = open('test.json')
----> 2 data = json.load(f)
3 f.close()
e:\Anaconda3\lib\json\__init__.py in load(fp, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
294 cls=cls, object_hook=object_hook,
295 parse_float=parse_float, parse_int=parse_int,
--> 296 parse_constant=parse_constant, object_pairs_hook=object_pairs_hook, **kw)
297
298
e:\Anaconda3\lib\json\__init__.py in loads(s, encoding, cls, object_hook, parse_float, parse_int, parse_constant, object_pairs_hook, **kw)
346 parse_int is None and parse_float is None and
347 parse_constant is None and object_pairs_hook is None and not kw):
--> 348 return _default_decoder.decode(s)
349 if cls is None:
350 cls = JSONDecoder
e:\Anaconda3\lib\json\decoder.py in decode(self, s, _w)
335
336 """
--> 337 obj, end = self.raw_decode(s, idx=_w(s, 0).end())
338 end = _w(s, end).end()
339 if end != len(s):
e:\Anaconda3\lib\json\decoder.py in raw_decode(self, s, idx)
351 """
352 try:
--> 353 obj, end = self.scan_once(s, idx)
354 except StopIteration as err:
355 raise JSONDecodeError("Expecting value", s, err.value) from None
JSONDecodeError: Expecting property name enclosed in double quotes: line 1 column 2 (char 1)
pd.read_json('test.json')
results in: pd.read_json('test.json')
结果:---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-118-771e17311e28> in <module>
----> 1 pd.read_json('test.json')
e:\Anaconda3\lib\site-packages\pandas\util\_decorators.py in wrapper(*args, **kwargs)
212 else:
213 kwargs[new_arg_name] = new_arg_value
--> 214 return func(*args, **kwargs)
215
216 return cast(F, wrapper)
e:\Anaconda3\lib\site-packages\pandas\io\json\_json.py in read_json(path_or_buf, orient, typ, dtype, convert_axes, convert_dates, keep_default_dates, numpy, precise_float, date_unit, encoding, lines, chunksize, compression)
606 return json_reader
607
--> 608 result = json_reader.read()
609 if should_close:
610 filepath_or_buffer.close()
e:\Anaconda3\lib\site-packages\pandas\io\json\_json.py in read(self)
729 obj = self._get_object_parser(self._combine_lines(data.split("\n")))
730 else:
--> 731 obj = self._get_object_parser(self.data)
732 self.close()
733 return obj
e:\Anaconda3\lib\site-packages\pandas\io\json\_json.py in _get_object_parser(self, json)
751 obj = None
752 if typ == "frame":
--> 753 obj = FrameParser(json, **kwargs).parse()
754
755 if typ == "series" or obj is None:
e:\Anaconda3\lib\site-packages\pandas\io\json\_json.py in parse(self)
855
856 else:
--> 857 self._parse_no_numpy()
858
859 if self.obj is None:
e:\Anaconda3\lib\site-packages\pandas\io\json\_json.py in _parse_no_numpy(self)
1087 if orient == "columns":
1088 self.obj = DataFrame(
-> 1089 loads(json, precise_float=self.precise_float), dtype=None
1090 )
1091 elif orient == "split":
ValueError: Expected object or value
{'work'
and JSON would be {"work"
, single quotes vs. double quotes.{'work'
和 JSON 将是{"work"
,单引号与双引号。.replace("'", '"')
will not work because the value of 'comment'
is properly double quoted ( "..."
), because there are words with an apostrophe (eg "...father's..."
). Using replace, will produce a result like '...father"s...'
..replace("'", '"')
将不起作用,因为'comment'
的值被正确地双引号( "..."
),因为有些单词带有撇号(例如"...father's..."
). 使用替换,将产生类似'...father"s...'
结果。dicts
.dicts
行。str
typestr
类型ast.literal_eval
to convert each row back to a dict
typeast.literal_eval
将每一行转换回dict
类型rows
, into a dataframe.rows
读入 dataframe。import pandas as pd
from ast import literal_eval
from pathlib import Path
# read file
file = Path('e:/PythonProjects/stack_overflow/test.json') # path to file
with file.open('r', encoding='utf-8') as f: # open the file
rows = [literal_eval(row) for row in f.readlines()] # list comprehension to convert each row back to a dict
# convert rows to a dataframe
df = pd.DataFrame(rows)
# display(df)
work flags unixtime stars nhelpful time comment user
0 2505753 [] 1260403200 1.0 0 Dec 10, 2009 I really thought that I would like this book. I'm fascinated by this time period, and the plots to assassinate Hitler have always intrigued me. However, this book was so boring that I had to force myself to read it. The author no doubt has a commanding vocabulary, but his writing style and word choices made the book a chore to read. I've read dry textbooks that had more life to them than this novel. schatzi
1 12458291 [] 1361664000 4.0 0 Feb 24, 2013 After her father's death, Lena discovers that her father had been keeping many secrets from her. Lena is a member of the. Silenti, telepaths who came to our world through a portal. She must learn to navigate through the social, religious, and political pitfalls of her new life. Who can she trust? What will her role be? I enjoyed this story and the world the author created very much. aztwinmom
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.