[英]Memory error while reading a very long line in a json file with Python
我有一个带有很长行的 1GB json 文件,当我尝试从文件中加载一行时,我从 PyCharm 控制台收到此错误:
Traceback (most recent call last):
File "C:\Program Files\JetBrains\PyCharm 2017.3.3\helpers\pydev\pydev_run_in_console.py", line 53, in run_file
pydev_imports.execfile(file, globals, locals) # execute the script
File "......... .py", line 26, in <module>
for line in f:
MemoryError
PyDev console: starting.
Python 2.7.14 (v2.7.14:84471935ed, Sep 16 2017, 20:19:30) [MSC v.1500 32 bit (Intel)] on win32
我在一台具有 64GB RAM 的 Windows Server 机器上。
我的代码是:
import numpy as np
import json
import sys
import re
idRegEx = re.compile(r".*ID=")
endElRegEx = re.compile(r"'.*")
ratingsFile = sys.argv[1]
tweetsFile = sys.argv[2]
outputFile = sys.argv[3]
tweetsMap = {}
with open(tweetsFile, "r") as f:
for line in f:
tweetData = json.loads(line)
tweetsMap[tweetData["key"]] = tweetData
output = open(outputFile, "w")
with open(ratingsFile, "r") as f:
header = f.next()
for line in f:
topicData = line.split("\t")
topicKey = topicData[0]
topicTerms = topicData[1]
ratings = topicData[2]
reasons = topicData[3]
ratings = map(lambda x: int(x.strip().replace("'", "")), ratings.replace("[", "").replace("]", "").split(","))
ratings = np.array(ratings)
tweetsMap[topicKey]["ratings"] = ratings.tolist()
tweetsMap[topicKey]["mean"] = ratings.mean()
topicMap = tweetsMap[topicKey]
print topicMap["key"], topicMap["mean"]
json.dump(topicMap, output, sort_keys=True)
output.write("\n")
output.close()
错误消息中的第 26 行是指
tweetData = json.loads(line)
而第 53 行是指
json.dump(topicMap, output, sort_keys=True)
奇怪的是,我从 GitHub 分叉了这段代码,所以我认为它应该可以工作。
看起来您使用的是 32 位版本的 Python:
Python 2.7.14 (...) [MSC v.1500 32 bit (Intel)] on win32
Windows 上每个进程的内存限制为 2GB,这就是为什么即使您有足够的 RAM 也会收到内存错误的原因。 如果您不想更改脚本,切换到 64 位版本的 Python 应该可以解决您的问题。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.