[英]Memory leak in my Google App Engine code
I have the following code that is trying to loop over a large table (~100k rows; ~30GB) 我有以下代码试图循环一个大表(~100k行; ~30GB)
def updateEmailsInLoop(cursor=None, stats={}):
BATCH_SIZE=10
try:
rawEmails, next_cursor, more = RawEmailModel.query().fetch_page(BATCH_SIZE, start_cursor=cursor)
for index, rawEmail in enumerate(rawEmails):
stats = process_stats(rawEmail, stats)
i = 0
while more and next_cursor:
rawEmails, next_cursor, more = RawEmailModel.query().fetch_page(BATCH_SIZE, start_cursor=next_cursor)
for index, rawEmail in enumerate(rawEmails):
stats = process_stats(rawEmail, stats)
i = (i + 1) %100
if i == 99:
logging.info("foobar: Finished 100 more %s", str(stats))
write_stats(stats)
except DeadlineExceededError:
logging.info("foobar: Deadline exceeded")
for index, rawEmail in enumerate(rawEmails[index:], start=index):
stats = process_stats(rawEmail, stats)
if more and next_cursor:
deferred.defer(updateEmailsInLoop, cursor = next_cursor, stats=stats, _queue="adminStats")
However, I keep getting the following error: 但是,我不断收到以下错误:
While handling this request, the process that handled this request was found to be using too much memory and was terminated. 在处理此请求时,发现处理此请求的进程使用了太多内存并被终止。 This is likely to cause a new process to be used for the next request to your application. 这可能会导致新进程用于您的应用程序的下一个请求。 If you see this message frequently, you may have a memory leak in your application. 如果经常看到此消息,则可能是应用程序中存在内存泄漏。
...and sometimes.... ...有时....
Exceeded soft private memory limit of 128 MB with 154 MB after servicing 9 requests total 在为9个请求提供服务后,超过128 MB的软私有内存限制(154 MB)
I had changed my code so I was always only pulling in 10 entries at any given time, so I don't get why I'm still running out of memory? 我已经改变了我的代码所以我总是只在任何给定的时间内输入10个条目,所以我不知道为什么我仍然没有内存不足?
There are 3 ways to do this kind of job (iteration on a large set of rows in datastore): 有三种方法可以完成这种工作(在数据存储区中对大量行进行迭代):
For most of my apps that i needed this x is usually between 100-500. 对于我需要的大多数应用程序,x通常在100-500之间。 Here is the code i use for iteration over 1.5m-2m rows to generate some reports or update stuff in my db. 这是我用于迭代超过1.5m-2m行的代码,以生成一些报告或更新我的数据库中的内容。 For reports i save an entity that contains the information i need in csv format, and at the end, i read all entities, merge them, and delete them. 对于报告,我保存包含csv格式所需信息的实体,最后,我读取所有实体,合并它们并删除它们。 (done this to generate 1.5m rows of excel data) (it's java, but should be easily translated to python): (这样做是为了生成1.5m行的excel数据)(它是java,但应该很容易翻译成python):
resp.getWriter().println("<html><head>");
resp.getWriter().println(
"<script type='text/javascript'>function f(){window.location.href='/do/convert/" + this.getClass().getSimpleName() + "?cursor=" + cursorString + "&count="
+ count + "';}</script>");
resp.getWriter().println("</head><body onload='f()'>");
resp.getWriter().println(
"<a href='/do/convert/" + this.getClass().getSimpleName() + "?cursor=" + cursorString + "&count=" + count + "'>Next page -->" + cursorString + " </a>");
resp.getWriter().println("</body></html>");
If your "progress" is big and messy, save it in entities (one or more, depending on what you are doing) If you are doing the task version, i recommend to either use task names or to make your tasks idempotent (especially if your counting stuff). 如果您的“进度”很大且很混乱,请将其保存在实体中(一个或多个,取决于您正在做的事情)如果您正在执行任务版本,我建议您使用任务名称或使您的任务具有幂等性(特别是如果你的计数东西)。 If your counting stuff, i recommend saving entities that contain the keys of the entities that you are counting, and at the end, count those. 如果你的计数东西,我建议保存包含你正在计数的实体的密钥的实体,并在最后计算这些。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.