简体   繁体   English

有没有办法提高for循环处理列表的速度?

[英]Is there a way to improve the speed of for-loop dealing with list?

I want to improve the performance of my code.我想提高我的代码的性能。 I tried few ways someone adviced before, but the speed of my code is still slow.我尝试了一些之前有人建议的方法,但我的代码速度仍然很慢。 What can I do instead of trying the way I tried?我可以做什么而不是尝试我尝试过的方式?

My code is here:我的代码在这里:

matched_word = []
for w in word_list:
    for str_ in dictionary:
        if str_ == w:
            matched_word.append(str_)

There are some points of reference here:这里有一些参考点:

  • First, the length of word_list is 160,000, and the length of dictionary is about 200,000.首先word_list的长度是160000,dictionary的长度大约是200000。
  • Second, I can not use a set of word_list because I want to make a list ( matched_word ) including duplicated words (the element of word_list ).其次,我不能,因为我想打一个列表(使用一组WORD_LIST的matched_word ),包括重复的单词(的元素word_list )。
  • Third, the following code is still working slow.第三,以下代码仍然运行缓慢。
import collections
matched_word = collections.deque
for w in dictionary:
    if w in word_list:
        matched_word.append(w)
  • Fourth, the following code is also still working slow.第四,下面的代码也仍然运行缓慢。
matched_word = [w for w in word_list if w in dictionary]

Thanks for your help.谢谢你的帮助。 (Thanks to all people who adviced before too.) (也感谢所有之前提供建议的人。)

Python use GIL to avoid deadlocks that's why python works only in single thread and that's why in some cases it is too slow. Python 使用 GIL 来避免死锁,这就是为什么 python 只能在单线程中工作的原因,这也是为什么在某些情况下它太慢的原因。 I'll give an example of code that you can implement in your own.我将给出一个您可以自己实现的代码示例。 So, instead of use threading we'll use multiprocessing, maybe the difference do not change so much but try anyway!因此,我们将使用多处理而不是使用线程,也许差异不会改变太多,但无论如何都要尝试!

Example code:示例代码:

from multiprocessing import Pool
import time


COUNTER = 50000000

def count(n):
    while n > 0:
        n -= 1

if __name__ == '__main__':
    pool = Pool(processes=2)  # Here you choose how many processes do you want!
    start = time.time()
    # First parametre: the function; second: the value!
    r1 = pool.apply_async(count, [COUNTER//2]) # It is 2 because I choose 2 processes, but can be more!
    r2 = pool.apply_async(count, [COUNTER//2])
    pool.close()
    pool.join()
    end = time.time()
    print(f'Seconds: {end - start}')

That's it!而已! Take a look in my code and try to use in your own!看看我的代码并尝试在你自己的代码中使用! Maybe it helps!也许它有帮助!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM