使用太多cpu的简单python脚本

Question

I was recently told off by my vps as my python script was using too much cpu (apparently the script was utilising the entire core for a few hours). 最近，我的vps通知我，因为我的python脚本使用了太多的cpu（显然，该脚本使用了整个核心几个小时）。

my script uses the twython library to stream tweets 我的脚本使用twython库流推文

def on_success(self, data):

    if 'text' in data:
        self.counter += 1
        self.tweetDatabase.save(Tweet(data))

        #we only want to commit when we have a batch
        if self.counter >= 1000:
            print("{0}: commiting {1} tweets".format(datetime.now(), self.counter))
            self.counter = 0
            self.tweetDatabase.commit()

Tweet is a class that's job is to throw away meta data about the tweet I do not need: Tweet是一个类，其工作是丢弃有关我不需要的Tweet的元数据：

class Tweet():

    def __init__(self, json):

        self.user = {"id" : json.get('user').get('id_str'), "name" : json.get('user').get('name')}
        self.timeStamp = datetime.datetime.strptime(json.get('created_at'), '%a %b %d %H:%M:%S %z %Y')
        self.coordinates  = json.get('coordinates')
        self.tweet = {
                        "id" : json.get('id_str'),
                        "text" : json.get('text').split('#')[0],
                        "entities" : json.get('entities'),
                        "place" :  json.get('place')
                     }

        self.favourite = json.get('favorite_count')
        self.reTweet = json.get('retweet_count')

it also has a __str__ method that will return a super compact string representation of the object 它还具有__str__方法，该方法将返回对象的超紧凑字符串表示形式

the tweetDatabase.commit() just saves the tweets to a file while the tweetDatabase.Save() just saves the tweet to a list: tweetDatabase.commit()仅将tweet保存到文件，而tweetDatabase.Save()仅将tweet保存到列表：

def save(self, tweet):
    self.tweets.append(tweet.__str__())

def commit(self):
    with open(self.path, mode='a', encoding='utf-8') as f:
        f.write('\n'.join(self.tweets))

    self.tweets = []

whats the best way to keep the cpu low? 保持CPU低的最佳方法是什么？ if I sleep I will be losing tweets as that will be time the program is spent not listening to twitters api. 如果我睡着了，我将失去推文，因为那将是该程序花费在不收听twitters api上的时间。 Dispite this I tried sleeping for a second after the program writes to file however this did nothing to bring the cpu down. 尽管这样，我在程序写入文件后尝试睡了一秒钟，但这并没有降低CPU的性能。 For record saving to file every 1000 tweets is just over once a Minute. 为了将记录保存到文件中，每1000条鸣叫仅需一分钟。

many thanks 非常感谢

Answer 1

You can try profiling your program with 您可以尝试使用以下方法对程序进行性能分析

import cProfile
command = """<whatever line that starts your program>"""
cProfile.runctx( command, globals(), locals(), filename="OpenGLContext.profile" )

and then viewing the OpenGLContext.profile with RunSnakeRun ( http://www.vrplumber.com/programming/runsnakerun/ ) 然后使用RunSnakeRun（ http://www.vrplumber.com/programming/runsnakerun/ ）查看OpenGLContext.profile

The bigger a block is, the more CPU time that function takes. 块越大，该功能占用的CPU时间就越多。 This will help you to locate exactly which part of your program is taking a lot of CPU 这将帮助您准确确定程序的哪一部分占用大量CPU

Answer 2

Try checking if you need to commit first in on_success(). 尝试检查是否需要先在on_success（）中提交。 Then, check if the tweet has data you want to save. 然后，检查该推文是否包含您要保存的数据。 You also might want to consider race conditions on the self.counter variable, and should probably have the update to the self.count be wrapped in a mutex or something similar. 您可能还需要考虑self.counter变量上的竞争条件，并且应该将self.count的更新内容包装在互斥锁或类似内容中。

使用太多cpu的简单python脚本

问题描述

2 个解决方案

解决方案1
1 2014-02-06 00:15:15

解决方案2
1 已采纳 2014-02-06 05:30:20

使用太多cpu的简单python脚本

问题描述

2 个解决方案

解决方案1 1 2014-02-06 00:15:15

解决方案2 1 已采纳 2014-02-06 05:30:20

解决方案1
1 2014-02-06 00:15:15

解决方案2
1 已采纳 2014-02-06 05:30:20