简体   繁体   English

使用 pool.imap 时无法腌制 psycopg2.extensions.connection 对象,但可以在单个进程中完成

[英]Can't pickle psycopg2.extensions.connection objects when using pool.imap, but can be done in individual processes

I am trying to build an application which will "check out" a cell, which is a square covering a part of land in a geographic database, and perform an analysis of the features within that cell.我正在尝试构建一个应用程序,它将“检查”一个单元格,该单元格是一个覆盖地理数据库中一部分土地的正方形,并对该单元格内的特征进行分析。 Since I have many cells to process, I am using a multiprocessing approach.由于我要处理许多单元格,因此我使用的是多处理方法。

I had it somewhat working inside of my object like this:我让它在我的 object 内部工作,如下所示:

class DistributedGeographicConstraintProcessor:

    ...

    def _process_cell(self, conn_string):

        conn = pg2.connect(conn_string)
        try:
            cur = conn.cursor()

            cell_id = self._check_out_cell(cur)
            conn.commit()
            print(f"processing cell_id {cell_id}...")

            for constraint in self.constraints:
                # print(f"processing {constraint.name()}...")
                query = constraint.prepare_distributed_query(self.job, self.grid)
                cur.execute(query, {
                    "buffer": constraint.buffer(),
                    "cell_id": cell_id,
                    "name": constraint.name(),
                    "simplify_tolerance": constraint.simplify_tolerance()
                })

            # TODO: do a final race condition check to further suppress duplicates
            self._check_in_cell(cur, cell_id)
            conn.commit()

        finally:
            del cur
            conn.close()

        return None

    def run(self):

        while True:
            if not self._job_finished():
                params = [self.conn_string] * self.num_cores
                processes = []
                for param in params:
                    process = mp.Process(target=self._process_cell, args=(param,))
                    processes.append(process)
                    sleep(0.1)  # Prevent multiple processes from checkout out the same grid square
                    process.start()
                for process in processes:
                    process.join()
            else:
                self._finalize_job()
                break

But the problem is that it will only start four processes and wait until they all finish before starting four new processes.但问题是它只会启动四个进程并等到它们都完成后再启动四个新进程。

I want to make it so when one process finishes its work, it will begin working on the next cell immediately, even if its co-processes are not yet finished.我想这样当一个进程完成它的工作时,它会立即开始在下一个单元上工作,即使它的协同进程还没有完成。

I am unsure about how to implement this and I have tried using a pool like this:我不确定如何实现这一点,我尝试过使用这样的池:

def run(self):

    pool = mp.Pool(self.num_cores)
    unprocessed_cells = self._unprocessed_cells()
    for i in pool.imap(self._process_cell, unprocessed_cells):
        print(i)

But this just tells me that the connection is not able to be pickled:但这只是告诉我连接不能被腌制:

TypeError: can't pickle psycopg2.extensions.connection objects

But I do not understand why, because it is the exact same function that I am using in the imap function as in the Process target.但我不明白为什么,因为它与我在imap function 中使用的 function 与Process目标完全相同。

I have already looked at these threads, here is why they do not answer my question:我已经看过这些线程,这就是为什么他们不回答我的问题:

My guess is that you're attaching some connection object to self ;我的猜测是您正在将一些连接 object 附加到self try to rewrite your solution using functions only (no classes/methods).尝试仅使用函数(无类/方法)重写您的解决方案。

Here is a simplified version of a single producer/multiple workers solution I used some time ago:这是我前段时间使用的单生产者/多工人解决方案的简化版本:

def worker(param):
    //connect to pg
    //do work


def main():
    pool = Pool(processes=NUM_PROC)
    tasks = []
    for param in params:
        t = pool.apply_async(utils.process_month, args=(param, ))
        tasks.append(t)
    pool.close()
    finished = false
    while not finished:     
        finished = True
        for t in tasks:
            if not t.ready():
                finished = False
                break
        time.sleep(1)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 我可以在 Pool.imap 调用的 function 中使用多处理队列吗? - Can I use a multiprocessing Queue in a function called by Pool.imap? 使用泡菜时发生错误,TypeError:无法泡菜ElementBase对象 - Error when using pickle, TypeError: can't pickle ElementBase objects 使用HappyBase连接池的PySpark dataframe.foreach()返回'TypeError:无法pickle thread.lock对象' - PySpark dataframe.foreach() with HappyBase connection pool returns 'TypeError: can't pickle thread.lock objects' Python 多处理 - 我们可以将 (itertools.islice) 可迭代对象直接传递给 pool.imap 而无需转换为列表吗? - Python Multiprocessing - Can we pass an (itertools.islice) iterable directly to pool.imap whithout converting to a list? 多处理 Pool.imap 坏了? - multiprocessing Pool.imap broken? 使用Web服务时无法腌制_thread.RLock对象 - can't pickle _thread.RLock objects when using a webservice 使用Pool的Tensorflow错误:无法腌制_thread.RLock对象 - Tensorflow Error using Pool: can't pickle _thread.RLock objects 使用池进行多处理时无法腌制本地 object。map - Can't pickle local object when multiprocessing with pool.map 不能泡菜 <type 'thread.lock'> 使用python multiprocess.pool.map_async()时 - Can't pickle <type 'thread.lock'> when using python multiprocess.pool.map_async() 不能腌制<type 'instancemethod'>当使用多处理 Pool.map()</type> - Can't pickle <type 'instancemethod'> when using multiprocessing Pool.map()
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM