简体   繁体   English

使用python循环进行MultiThreading

[英]MultiThreading with a python loop

I an trying to run this Python code on several threads of my processor, but I can't find how to allocate multiple threads. 我试图在我的处理器的几个线程上运行这个Python代码,但我找不到如何分配多个线程。 I am using python 2.7 in Jupyter (formerly IPython). 我在Jupyter(以前的IPython)中使用python 2.7 The initial code is below (all this part works perfectly). 初始代码如下(所有这一部分都完美无缺)。 It is a web parser which takes x ie, a url among my_list ie, a list of url and then write a CSV (where out_string is a line). 它是一个Web解析器,它接受x即my_list中的url,即url列表,然后写入CSV(其中out_string是一行)。

Code without MultiThreading 没有MultiThreading的代码

my_list = ['http://stackoverflow.com/', 'http://google.com']

def main():
    with open('Extract.csv'), 'w') as out_file:
        count_loop = 0
        for x in my_list:
            #================  Get title ==================#
            out_string = ""
            campaign = parseCampaign(x)
            out_string += ';' + str(campaign.getTitle())

            #================ Get Profile ==================#
            if campaign.getTitle() != 'NA':
                creator = parseCreator(campaign.getCreatorUrl())
                out_string += ';' + str(creator.getCreatorProfileLinkUrl())
            else:
                pass
            #================ Write ==================#
            out_string += '\n'
            out_file.write(out_string) 
            count_loop +=1
            print '---- %s on %s ------- ' %(count_loop, len(my_list))

Code with MultiThreading but not working 使用MultiThreading但无法正常工作的代码

from threading import Thread
my_list = ['http://stackoverflow.com/', 'http://google.com']

def main(x):
    with open('Extract.csv'), 'w') as out_file:
        count_loop = 0
        for x in my_list:
            #================  Get title ==================#
            out_string = ""
            campaign = parseCampaign(x)
            out_string += ';' + str(campaign.getTitle())

            #================ Get Profile ==================#
            if campaign.getTitle() != 'NA':
                creator = parseCreator(campaign.getCreatorUrl())
                out_string += ';' + str(creator.getCreatorProfileLinkUrl())
            else:
                pass
            #================ Write ==================#
            out_string += '\n'
            out_file.write(out_string) 
            count_loop +=1
            print '---- %s on %s ------- ' %(count_loop, len(my_list))

for x in my_list:
    t = Thread(target=main, args=(x,))
    t.start()
    t2 = Thread(target=main, args=(x,))
    t2.start()

I cannot find a good way to implement more than one thread to run this piece of code, and I am a bit confused because the documentation is not very easy to understand. 我找不到一个好方法来实现多个线程来运行这段代码,我有点困惑,因为文档不是很容易理解。 With one core, this code takes 2 hours long, multi-threading will save me lot of time! 有一个核心,这段代码需要2个小时,多线程将节省我很多时间!

Well... the answer to: 那么......答案是:

Why would you assign two threads for the same exact task? 为什么要为同一个确切的任务分配两个线程?

is: 是:

to run faster the loop 更快地运行循环

(see at the comments of the original post) (见原帖的评论)

then something is pretty wrong here. 那么这里有些不对劲。

Dear OP, both of the threads will do exactly the same thing! 亲爱的OP,这两个线程会做同样的事情! This means that the first thread will do exactly the same thing as the second. 这意味着第一个线程将完成与第二​​个线程完全相同的操作。

What you can do is something like the following: 你可以做的是如下:

import multiprocessing

nb_cores = 2  # Put the correct amount

def do_my_process_for(this_argument):
  # Add the actual code
  pass

def main():

  pool = multiprocessing.Pool(processes=nb_cores)

  results_of_processes = [pool.apply_async(
      do_my_process, 
      args=(an_argument, ),
      callback=None
  ) for an_argument in arguments_list]

  pool.close()
  pool.join()

Basically, you can think each process/thread as having its own "mind". 基本上,您可以将每个进程/线程视为拥有自己的“头脑”。 This means that in your code the first thread will do the process defined in main() for the argument x (taken from your iteration on your list) and the second one will do the same task (the one in the main() ) again for x . 这意味着在你的代码中,第一个线程将执行main()为参数x定义的进程(取自列表中的迭代),第二个线程将再次执行相同的任务( main()中的那个)对于x

What you need is to formulate your process as a procedure having a set of input parameters and a set of output. 您需要的是将您的过程表示为具有一组输入参数和一组输出的过程。 Then you can create multiple processes, to each of them give one of the desired input parameters and then the process will execute your main routine with the proper parameter. 然后,您可以创建多个进程,为每个进程提供一个所需的输入参数,然后该进程将使用适当的参数执行您的主例程。

Hope it helps. 希望能帮助到你。 See also the code and I think you will understand it. 另请参阅代码,我想您会理解它。

Also, see: 另见,见:

multiprocessing map and asynchronous map (I don't remember right now the exact name) 多处理映射和异步映射(我现在不记得确切的名称)

and

functools partial functools偏

Ok, lets break down your problem. 好吧,让我们分解你的问题。

First of all your main() method processes all the inputs and outputs to a file. 首先,您的main()方法处理文件的所有输入和输出。 When you use main with 2 threads same work is done by both the threads. 当你使用带有2个线程的main时,两个线程完成相同的工作。 You need a method that processes only one input and returns output for that input. 您需要一个只处理一个输入并返回该输入的输出的方法。

def process_x(x):
    #================  Get title ==================#
    out_string = ""
    campaign = parseCampaign(x)
    out_string += ';' + str(campaign.getTitle())

    #================ Get Profile ==================#
    if campaign.getTitle() != 'NA':
        creator = parseCreator(campaign.getCreatorUrl())
        out_string += ';' + str(creator.getCreatorProfileLinkUrl())
    else:
        pass
    #================ Write ==================#
    out_string += '\n'
    return out_string

Now you can call this method in multiple threads and get output of each x separately. 现在,您可以在多个线程中调用此方法,并分别获取每个x输出。

from threading import Thread
my_list = ['http://stackoverflow.com/', 'http://google.com']
threads = list()
for x in my_list:
    t = Thread(target=process_x, args=(x,))
    t.start()

But the problem is this will start n number of threads where n is number of elements in my_list. 但问题是这会启动n个线程,其中n是my_list中的元素数。 So, use of multiprocessing.Pool will be better here. 所以,使用multiprocessing.Pool会更好。 So instead, use 所以相反,使用

from multiprocessing import Pool
pool = Pool(processes=4)              # start 4 worker processes
result_list = pool.map(process_x, my_list)

result_list here will have results of all the list. result_list这里将包含所有列表的结果。 So now you can save it in file. 所以现在你可以将它保存在文件中。

with open('Extract.csv'), 'w') as out_file:
    out_file.writelines(result_list)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM