简体   繁体   English

我的代码是否泄漏内存(python)?

[英]Does my code leak memory (python)?

    links_list = char.getLinks(words)
    for source_url in links_list:
        try:
            print 'Downloading URL: ' + source_url
            urldict = hash_url(source_url)
            source_url_short = urldict['url_short']
            source_url_hash = urldict['url_short_hash']
            if Url.objects.filter(source_url_short = source_url_short).count() == 0:
                    try:
                        htmlSource = getSource(source_url)
                    except:
                        htmlSource = '-'
                        print '\thtmlSource got an error...'
                new_u = Url(source_url = source_url, source_url_short = source_url_short, source_url_hash = source_url_hash, html = htmlSource)
                new_u.save()
                time.sleep(3)
            else:
                print '\tAlready in database'
        except:
            print '\tError with downloading URL..'
            time.sleep(3)
            pass


def getSource(theurl, unicode = 1, moved = 0):
    if moved == 1:
        theurl = urllib2.urlopen(theurl).geturl()
    urlReq = urllib2.Request(theurl)
    urlReq.add_header('User-Agent',random.choice(agents))
    urlResponse = urllib2.urlopen(urlReq)
    htmlSource = urlResponse.read()
    htmlSource =  htmlSource.decode('utf-8').encode('utf-8')
    return htmlSource

basically what this code does is...it takes a list of URLs and downloads them, saves them to a DB. 基本上,这段代码所做的是...它获取URL列表并将其下载并保存到数据库中。 That's all. 就这样。

maybe your process uses too much memory and the server (perhaps shared host) just kills it because you exhaust your memory quota. 可能您的进程使用了​​过多的内存,而服务器(也许是共享主机)只是杀死了它,因为您耗尽了内存配额。

here you use a call that may eat up a lot of memory: 在这里,您使用的呼叫可能会占用大量内存:

links_list = char.getLinks(words)
for source_url in links_list:
     ...

Looks like you might be building a whole list in memory and then work with items. 看起来您可能正在内存中构建整个列表,然后使用项目。 Instead it might be better to use iterator, where objects are retrieved one at at time. 取而代之的是,最好使用迭代器,该迭代器可以一次检索一个对象。 But this is a guess because it's hard to tell from your code what char.getLinks does 但这是一个猜测,因为很难从您的代码中看出char.getLinks的作用

if you are using Django in debug mode, then memory usage will go up, as Mark suggests. 如果您在调试模式下使用Django,则内存使用量将上升,如Mark所建议的。

If you are doing this in Django, make sure DEBUG is False, otherwise it will cache every query. 如果您在Django中执行此操作,请确保DEBUG为False,否则它将缓存每个查询。

See FAQ 查看常见问题

The easiest way to check is to go to the task manager (on Windows - or equivalent on other platforms) and check the memory requirements of the Python process. 最简单的检查方法是转到任务管理器(在Windows上-或在其他平台上等效),然后检查Python进程的内存需求。 If it stays constant, there are no memory leaks. 如果保持不变,则没有内存泄漏。 If not, you have a memory leak somewhere and you will need to debug 如果没有,那么您的某处内存泄漏,您将需要进行调试

Perhaps you should get a job server such as beanstalkd and think about doing just one at a time. 也许您应该获得一个像beantalks这样的工作服务器,并考虑一次只做一个。

The job server will requeue the ones that fail, allowing the rest to complete. 作业服务器将重新排队失败的服务器,从而使其余服务器完成。 You can also run more than one client concurrently should you need to (even on more than one machine). 如果需要,您还可以同时运行多个客户端(甚至在一台计算机上)。

Simpler design, easier to understand and test, more fault tolerant, retryable, more scalable, etc... 更简单的设计,更易于理解和测试,更容错,可重试,更可扩展等

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM