简体   繁体   English

Python一次访问多个网页

[英]Python accessing multiple webpages at once

I have a tkinter GUI that downloads data from multiple websites at once. 我有一个tkinter GUI,可一次从多个网站下载数据。 I run a seperate thread for each download (about 28). 我为每次下载运行一个单独的线程(大约28个)。 Is that too much threads for one GUI process? 一个GUI进程是否有太多线程? because it's really slow, each individual page should take about 1 to 2 seconds but when all are run at once it takes over 40 seconds. 因为它确实很慢,所以每个单独的页面大约需要1到2秒,但同时运行所有页面则需要40秒钟以上。 Is there any way I can shorten the time it takes to download all the pages? 有什么办法可以缩短下载所有页面的时间? Any help is appreciated, thanks. 任何帮助表示赞赏,谢谢。

It's probably the GIL (global interpreter lock) that gets in your way. 可能是GIL(全局解释器锁)妨碍了您的工作。 Python has some performance problems with many threads. Python的许多线程都有一些性能问题。

You could try twisted.web.getPage (see http://twistedmatrix.com/projects/core/documentation/howto/async.html a bit down the page). 您可以尝试twisted.web.getPage(请在页面下方查看http://twistedmatrix.com/projects/core/documentation/howto/async.html )。 I don't have benchmarks for that. 我没有基准。 But taking the example on that page and adding 28 deferreds to see how fast it is will give you a comparable result pretty fast. 但是,以该页面上的示例为例,并添加28个递延值可以看到它有多快,这将为您带来相当快的可比结果。 Keep in mind, that you'd have to use the gtk reactor and get into twisteds programming style, though. 请记住,尽管如此,您必须使用gtk反应器并进入twists编程风格。

A process can have hundreds of threads on any modern OS without any problem. 一个进程可以在任何现代OS上具有数百个线程,而不会出现任何问题。

If you're bandwidth-limited, 1 to 2 seconds times 28 means 40 seconds is about right. 如果带宽有限,则1到2秒乘以28意味着40秒是正确的。 If you're latency limited, it should be faster, but with no information, all I can suggest is: 如果您的延迟受到限制,那么它应该会更快,但是由于没有任何信息,我只能建议:

  • add logging to your code to make sure it's actually running in parallel, and that you're not accidentally serializing your threads somehow; 在您的代码中添加日志,以确保它实际上是并行运行的,并且您不会以某种方式意外地序列化线程;
  • use a network monitor to make sure that network requests are actually going out in parallel. 使用网络监视器来确保网络请求实际上是并行发出的。

It's hard to give anything better without more information. 没有更多信息,很难提供更好的选择。

You can try using processes instead of threads. 您可以尝试使用进程而不是线程。 Python has GIL which might cause some delays in your situation. Python具有GIL,这可能会导致您的情况有所延迟。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM