[英]My python program is running really slow
I'm making a program that (at least right now) retrives stream information from TwitchTV (streaming platform). 我正在制作一个程序(至少现在)从TwitchTV(流媒体平台)中检索流信息。 This program is to self educate myself but when i run it, it's taking 2 minutes to print just the name of the streamer.
这个程序是自我教育自己,但是当我运行它时,只需要2分钟打印流光的名称。
I'm using Python 2.7.3 64bit on Windows7 if that is important in anyway. 我在Windows7上使用Python 2.7.3 64位,如果这在任何方面都很重要的话。
classes.py: classes.py:
#imports:
import urllib
import re
#classes:
class Streamer:
#constructor:
def __init__(self, name, mode, link):
self.name = name
self.mode = mode
self.link = link
class Information:
#constructor:
def __init__(self, TWITCH_STREAMS, GAME, STREAMER_INFO):
self.TWITCH_STREAMS = TWITCH_STREAMS
self.GAME = GAME
self.STREAMER_INFO = STREAMER_INFO
def get_game_streamer_names(self):
"Connects to Twitch.TV API, extracts and returns all streams for a spesific game."
#start connection
self.con = urllib2.urlopen(self.TWITCH_STREAMS + self.GAME)
self.info = self.con.read()
self.con.close()
#regular expressions to get all the stream names
self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info
#run in a for to reduce all "live_user_NAME" values
for name in self.streamers_names:
if name.startswith("live_user_"):
self.streamers_names.remove(name)
#end method
return self.streamers_names
def get_streamer_mode(self, name):
"Returns a streamers mode (on/off)"
#start connection
self.con = urllib2.urlopen(self.STREAMER_INFO + name)
self.info = self.con.read()
self.con.close()
#check if stream is online or offline ("stream":null indicates offline stream)
if self.info.count('"stream":null') > 0:
return "offline"
else:
return "online"
main.py: main.py:
#imports:
from classes import *
#consts:
TWITCH_STREAMS = "https://api.twitch.tv/kraken/streams/?game=" #add the game name at the end of the link (space = "+", eg: Game+Name)
STREAMER_INFO = "https://api.twitch.tv/kraken/streams/" #add streamer name at the end of the link
GAME = "League+of+Legends"
def main():
#create an information object
info = Information(TWITCH_STREAMS, GAME, STREAMER_INFO)
streamer_list = [] #create a streamer list
for name in info.get_game_streamer_names():
#run for every streamer name, create a streamer object and place it in the list
mode = info.get_streamer_mode(name)
streamer_name = Streamer(name, mode, 'http://twitch.tv/' + name)
streamer_list.append(streamer_name)
#this line is just to try and print something
print streamer_list[0].name, streamer_list[0].mode
if __name__ == '__main__':
main()
the program itself works perfectly, just really slow 程序本身运行完美,只是非常慢
any ideas? 有任何想法吗?
Program efficiency typically falls under the 80/20 rule (or what some people call the 90/10 rule, or even the 95/5 rule). 程序效率通常低于80/20规则(或者某些人称之为90/10规则,甚至是95/5规则)。 That is, 80% of the time the program is actually running in 20% of the code.
也就是说,80%的时间程序在20%的代码中实际运行。 In other words, there is a good shot that your code has a "bottleneck": a small area of the code that is running slow, while the rest runs very fast.
换句话说,有一个好的镜头,你的代码有一个“瓶颈”:代码的一小部分运行缓慢,而其余的运行速度非常快。 Your goal is to identify that bottleneck (or bottlenecks), then fix it (them) to run faster.
您的目标是确定瓶颈(或瓶颈),然后修复它们(它们)以更快地运行。
The best way to do this is to profile your code. 执行此操作的最佳方法是分析您的代码。 This means you are logging the time of when a specific action occurs with the logging module, use timeit like a commenter suggested, use some of the built-in profilers , or simply print out the current time at very points of the program.
这意味着您使用日志记录模块记录特定操作发生的时间,使用timeit,如建议的评论者,使用某些内置的分析器 ,或者只是在程序的非常位置打印出当前时间。 Eventually, you will find one part of the code that seems to be taking the most amount of time.
最终,您会发现代码的一部分似乎花费了大量时间。
Experience will tell you that I/O (stuff like reading from a disk, or accessing resources over the internet) will take longer than in-memory calculations. 经验告诉您,I / O(从磁盘读取或通过Internet访问资源等内容)将花费比内存计算更长的时间。 My guess as to the problem is that you're using 1 HTTP connection to get a list of streamers, and then one HTTP connection to get the status of that streamer.
我对这个问题的猜测是你使用1个HTTP连接获取一个流媒体列表,然后使用一个HTTP连接来获取该流媒体的状态。 Let's say that there are 10000 streamers: your program will need to make 10001 HTTP connections before it finishes.
假设有10000个流媒体:您的程序在完成之前需要建立10001个HTTP连接。
There would be a few ways to fix this if this is indeed the case: 如果确实如此,有几种方法可以解决这个问题:
You are using the wrong tool here to parse the json data returned by your URL. 您在此处使用错误的工具来解析URL返回的json数据。 You need to use json library provided by default rather than parsing the data using regex .
您需要使用默认提供的json库,而不是使用正则表达式解析数据。 This will give you a boost in your program's performance
这将为您提升程序的性能
Change the regex parser 更改正则表达式解析器
#regular expressions to get all the stream names
self.info = re.sub(r'"teams":\[\{.+?"\}\]', '', self.info) #remove all team names (they have the same name: parameter as streamer names)
self.streamers_names = re.findall('"name":"(.+?)"', self.info) #looks for the name of each streamer in the pile of info
To json parser 到json解析器
self.info = json.loads(self.info) #This will parse the json data as a Python Object
#Parse the name and return a generator
return (stream['name'] for stream in data[u'streams'])
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.