在Python中使用多重处理将项目追加到列表

Question

In got this block of code: 在获得以下代码块：

def get_spain_accomodations():
    pool = Pool()
    links = soup.find_all('a', class_="hotel_name_link url")
    pool.map(get_page_links, links)

    #for a in soup.find_all('a', class_="hotel_name_link url"):
    #    hotel_url = "https://www.booking.com" + a['href'].strip()
    #    hotels_url_list.append(hotel_url)

def get_page_links(link):
     hotel_url = "https://www.booking.com" + link['href'].strip()
     hotels_url_list.append(hotel_url)

For some reason the hotel_url is not being appended to the list. 由于某种原因，没有将hotel_url附加到列表中。 If I try with the commented loop it actually works, but not with the map() function. 如果我尝试使用带注释的循环，则它实际上可以工作，但不能与map（）函数一起工作。 I also printed hotel_url for each get_page_links call and it worked. 我还为每个get_page_links调用打印了hotel_url，并且效果很好。 I have no idea what is going on. 我不知道发生了什么。 Below are the function callings. 下面是函数调用。

init_BeautifulSoup()
get_spain_accomodations()
#get_hotels_wifi_rating()

for link in hotels_url_list:
    print link

The code is executed without errors but the link list is not being printed. 代码执行无误，但未打印链接列表。

Answer 1

It's important to understand that processes run in isolated areas of memory. 重要的是要了解进程在隔离的内存区域中运行。 Each process will have their own instance of hotels_url_list and there's no (easy) way of "sticking" those values into the parent process' list: if in the parent process you create an instance of list , that instance is not the same that the subprocesses use: When you do a .fork() (aka create a subprocess), the memory of the parent process is cloned on the child process. 每个进程将具有自己的hotels_url_list 实例，并且没有（轻松）将这些值“粘贴”到父进程的列表中的方法：如果在父进程中创建list的实例，则该实例与子进程不同用途：执行.fork() （也称为创建子进程）时，父进程的内存将在子进程上克隆。 So, if the parent had an instance of list in the hotels_url_list variable, you'll also have an instance of list (also called hotels_url_list ) in the child process BUT they will not be the same (they'll occupy different areas in memory). 所以，如果家长有实例list中hotels_url_list变量，你也有一个实例list （也称为hotels_url_list子进程），但他们不会是相同的（它们会占用内存的不同区域）。

This doesn't happen with Threads . Threads不会发生这种情况。 They do share memory. 他们确实共享内存。

I would say (it's not like I'm much of an expert here) that the canonical way of communicating processes in this case would be a Queue : The child process puts things in the queue, the parent process grabs them: 我会说（这与我的专家并不多），在这种情况下，交流流程的规范方式将是Queue ：子流程将事物放入队列，父流程将它们获取：

from multiprocessing import Process, Queue


def get_spain_accomodations():
    q = Queue()
    processes = []
    links = ['http://foo.com', 'http://bar.com', 'http://baz.com']
    hotels_url_list = []
    for link in links:
        p = Process(target=get_page_links, args=(link, q,))
        p.start()
        processes.append(p)
    for p in processes:
        p.join()
        hotels_url_list.append(q.get())
    print("Collected: %s" % hotels_url_list)


def get_page_links(link, q):
    print("link==%s" % link)
    hotel_url = "https://www.booking.com" + link
    q.put(hotel_url)


if __name__ == "__main__":
    get_spain_accomodations()

This outputs each link prepended with https://www.booking.com , the pre-pending happening on independent processes: 这将输出每个以https://www.booking.com开头的链接，该链接发生在独立进程上：

link==http://foo.com
link==http://bar.com
link==http://baz.com
Collected: ['https://www.booking.comhttp://foo.com', 'https://www.booking.comhttp://bar.com', 'https://www.booking.comhttp://baz.com']

I don't know if it will help you, but to me, it helps seeing the Queue as a "shared file" that both processes know about. 我不知道它是否对您有帮助，但对我来说，这有助于将队列视为两个进程都知道的“共享文件”。 Imagine you have two complete different programs, and one of them knows that has to write things into a file called /tmp/foobar.txt and the other one knows that has to read from a file called /tmp/foobar.txt . 假设您有两个完全不同的程序，其中一个知道必须将内容写入一个名为/tmp/foobar.txt的文件中，而另一个知道必须从一个名为/tmp/foobar.txt的文件中进行读取。 That way they can "communicate" with each other. 这样，他们就可以彼此“交流”。 This paragraph is just a "metaphor" (although that's pretty much how Unix pipes work)... Is not like queues work exactly like that, but maybe it helps understanding the concept? 本段只是一个“隐喻”（尽管这几乎是Unix管道的工作方式）...并不是像队列那样工作，但是也许可以帮助理解这个概念？ Dunno, really, maybe I made it more confusing... 邓诺，真的，也许我让它更加令人困惑...

Another way would be using Threads and collect their return value, as explained here . 另一种方法是使用线程，并收集他们的返回值，如解释在这里。

在Python中使用多重处理将项目追加到列表

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-12-29 21:34:00

在Python中使用多重处理将项目追加到列表

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-12-29 21:34:00

解决方案1
0 已采纳 2017-12-29 21:34:00