我的python程序运行速度很慢

Question

I'm making a program that (at least right now) retrives stream information from TwitchTV (streaming platform). 我正在制作一个程序（至少现在）从TwitchTV（流媒体平台）中检索流信息。 This program is to self educate myself but when i run it, it's taking 2 minutes to print just the name of the streamer. 这个程序是自我教育自己，但是当我运行它时，只需要2分钟打印流光的名称。

I'm using Python 2.7.3 64bit on Windows7 if that is important in anyway. 我在Windows7上使用Python 2.7.3 64位，如果这在任何方面都很重要的话。

classes.py: classes.py：

#imports:
import urllib
import re

#classes:
class Streamer:

    #constructor:
    def __init__(self, name, mode, link):
        self.name = name
        self.mode = mode
        self.link = link

class Information:

    #constructor:
    def __init__(self, TWITCH_STREAMS, GAME, STREAMER_INFO):
        self.TWITCH_STREAMS = TWITCH_STREAMS
        self.GAME = GAME
        self.STREAMER_INFO = STREAMER_INFO

    def get_game_streamer_names(self):
        "Connects to Twitch.TV API, extracts and returns all streams for a spesific game."

        #start connection
        self.con = urllib2.urlopen(self.TWITCH_STREAMS + self.GAME)
        self.info = self.con.read()
        self.con.close()

        #regular expressions to get all the stream names
        self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
        self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info


        #run in a for to reduce all "live_user_NAME" values
        for name in self.streamers_names:
            if name.startswith("live_user_"):
                self.streamers_names.remove(name)

        #end method
        return self.streamers_names

    def get_streamer_mode(self, name):
        "Returns a streamers mode (on/off)"

        #start connection
        self.con = urllib2.urlopen(self.STREAMER_INFO + name)
        self.info = self.con.read()
        self.con.close()

    #check if stream is online or offline ("stream":null indicates offline stream)
    if self.info.count('"stream":null') > 0:
        return "offline"
    else:
        return "online"

main.py: main.py：

#imports:
from classes import *

#consts:
TWITCH_STREAMS = "https://api.twitch.tv/kraken/streams/?game=" #add the game name at the end of the link (space = "+", eg: Game+Name)
STREAMER_INFO  = "https://api.twitch.tv/kraken/streams/" #add streamer name at the end of the link
GAME = "League+of+Legends"

def main():
    #create an information object
    info = Information(TWITCH_STREAMS, GAME, STREAMER_INFO)

    streamer_list = [] #create a streamer list
    for name in info.get_game_streamer_names():
        #run for every streamer name, create a streamer object and place it in the list
        mode =  info.get_streamer_mode(name)
        streamer_name = Streamer(name, mode, 'http://twitch.tv/' + name)
        streamer_list.append(streamer_name)

    #this line is just to try and print something
    print streamer_list[0].name, streamer_list[0].mode


if __name__ == '__main__':
    main()

the program itself works perfectly, just really slow 程序本身运行完美，只是非常慢

any ideas? 有任何想法吗？

Answer 1

Program efficiency typically falls under the 80/20 rule (or what some people call the 90/10 rule, or even the 95/5 rule). 程序效率通常低于80/20规则（或者某些人称之为90/10规则，甚至是95/5规则）。 That is, 80% of the time the program is actually running in 20% of the code. 也就是说，80％的时间程序在20％的代码中实际运行。 In other words, there is a good shot that your code has a "bottleneck": a small area of the code that is running slow, while the rest runs very fast. 换句话说，有一个好的镜头，你的代码有一个“瓶颈”：代码的一小部分运行缓慢，而其余的运行速度非常快。 Your goal is to identify that bottleneck (or bottlenecks), then fix it (them) to run faster. 您的目标是确定瓶颈（或瓶颈），然后修复它们（它们）以更快地运行。

The best way to do this is to profile your code. 执行此操作的最佳方法是分析您的代码。 This means you are logging the time of when a specific action occurs with the logging module, use timeit like a commenter suggested, use some of the built-in profilers , or simply print out the current time at very points of the program. 这意味着您使用日志记录模块记录特定操作发生的时间，使用timeit，如建议的评论者，使用某些内置的分析器，或者只是在程序的非常位置打印出当前时间。 Eventually, you will find one part of the code that seems to be taking the most amount of time. 最终，您会发现代码的一部分似乎花费了大量时间。

Experience will tell you that I/O (stuff like reading from a disk, or accessing resources over the internet) will take longer than in-memory calculations. 经验告诉您，I / O（从磁盘读取或通过Internet访问资源等内容）将花费比内存计算更长的时间。 My guess as to the problem is that you're using 1 HTTP connection to get a list of streamers, and then one HTTP connection to get the status of that streamer. 我对这个问题的猜测是你使用1个HTTP连接获取一个流媒体列表，然后使用一个HTTP连接来获取该流媒体的状态。 Let's say that there are 10000 streamers: your program will need to make 10001 HTTP connections before it finishes. 假设有10000个流媒体：您的程序在完成之前需要建立10001个HTTP连接。

There would be a few ways to fix this if this is indeed the case: 如果确实如此，有几种方法可以解决这个问题：

See if Twitch.TV has some alternatives in their API that allows you to retrieve a list of users WITH their streaming mode so that you don't need to call an API for each streamer. 看看Twitch.TV是否在其API中有一些替代方案，允许您使用其流模式检索用户列表，这样您就不需要为每个流媒体调用API。
Cache results. 缓存结果。 This won't actually help your program run faster the first time it runs, but you might be able to make it so that if it runs a second time within a minute, it can reuse results. 这实际上不会帮助您的程序在第一次运行时运行得更快，但您可能能够使它在一分钟内第二次运行时，它可以重用结果。
Limit your application to only dealing with a few streamers at a time. 将您的应用程序限制为一次只处理几个飘带。 If there are 10000 streamers, what exactly does your application do that it really needs to look at the mode of all 10000 of them? 如果有10000个飘带，你的应用程序到底做了什么，它真的需要看看它们所有10000个模式的模式？ Perhaps it's better to just grab the top 20, at which point the user can press a key to get the next 20, or close the application. 也许最好只抓住前20名，此时用户可以按一个键来获得下一个20，或关闭应用程序。 Often times, programming is not just about writing code, but managing expectations of what your users want. 通常，编程不只是编写代码，而是管理用户期望的内容。 This seems to be a pet project, so there might not be "users", meaning you have free reign to change what the app does. 这似乎是一个宠物项目，所以可能没有“用户”，这意味着你有自由的统治来改变应用程序的功能。
Use multiple connections. 使用多个连接。 Right now, your app makes one connection to the server, waits for the results to come back, parses the results, saves it, then starts on the next connection. 现在，您的应用程序与服务器建立一个连接，等待结果返回，解析结果，保存，然后在下一个连接上启动。 This process might take an entire half a second. 这个过程可能需要整整半秒钟。 If there were 250 streamers, running this process for each of them would take a little over two minutes total. 如果有250个飘带，那么为每个拖缆运行这个过程总共需要两分多钟。 However, if you could run four of them at a time, you could potentially reduce your time to just under 30 seconds total. 但是，如果您一次可以运行其中的四个，则可能会将总时间减少到不到30秒。 Check out the multiprocessing module. 查看多处理模块。 Keep in mind that some APIs might have limits to how many connections you can make at a certain time, so hitting them with 50 connections at a time might irk them and cause them to forbid you from accessing their API. 请记住，某些API可能会限制您在特定时间可以建立的连接数，因此每次使用50个连接进行连接可能会使用它们并导致它们禁止您访问其API。 Use caution here. 请谨慎使用。

Answer 2

You are using the wrong tool here to parse the json data returned by your URL. 您在此处使用错误的工具来解析URL返回的json数据。 You need to use json library provided by default rather than parsing the data using regex . 您需要使用默认提供的json库，而不是使用正则表达式解析数据。 This will give you a boost in your program's performance 这将为您提升程序的性能

Change the regex parser 更改正则表达式解析器

#regular expressions to get all the stream names
        self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
        self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info

To json parser 到json解析器

self.info = json.loads(self.info) #This will parse the json data as a Python Object
#Parse the name and return a generator 
return (stream['name'] for stream in data[u'streams'])

我的python程序运行速度很慢

问题描述

2 个解决方案

解决方案1
8 2013-02-24 17:09:43

解决方案2
4 2013-02-24 16:58:23

我的python程序运行速度很慢

问题描述

2 个解决方案

解决方案1 8 2013-02-24 17:09:43

解决方案2 4 2013-02-24 16:58:23

解决方案1
8 2013-02-24 17:09:43

解决方案2
4 2013-02-24 16:58:23