peewee 多进程来自单独的数据库连接“peewee.OperationalError: 磁盘 I/O 错误”

Question

I am trying to use multiprocessing in order to run a CPU-intensive job in the background.我正在尝试使用多处理来在后台运行 CPU 密集型作业。 I'd like this process to be able to use peewee ORM to write its results to the SQLite database.我希望这个过程能够使用 peewee ORM 将其结果写入 SQLite 数据库。

In order to do so, I am trying to override the Meta.database of my model class after thread creation so that I can have a separate db connection for my new process.为此，我试图在线程创建后覆盖模型类的 Meta.database，以便我可以为我的新进程建立一个单独的数据库连接。

def get_db():
    db = SqliteExtDatabase(path)
    return db

class BaseModel(Model):
    class Meta:
        database = get_db()

# Many other models

class Batch(BaseModel):
    
    def multi():
        def background_proc():
            # trying to override Meta's db connection.
            BaseModel._meta.database = get_db()
            job = Job.get_by_id(1)
            print("working in the background")
        
        process = multiprocessing.Process(target=background_proc)
        process.start()

Error when executing my_batch.multi()执行my_batch.multi()时出错

Process Process-1:
Traceback (most recent call last):
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 3099, in execute_sql
    cursor.execute(sql, params or ())
sqlite3.OperationalError: disk I/O error

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/layne/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
    self.run()
  File "/Users/layne/.pyenv/versions/3.7.6/lib/python3.7/multiprocessing/process.py", line 99, in run
    self._target(*self._args, **self._kwargs)
  File "/Users/layne/Desktop/pydatasci/pydatasci/aidb/__init__.py", line 1249, in background_proc
    job = Job.get_by_id(1)
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 6395, in get_by_id
    return cls.get(cls._meta.primary_key == pk)
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 6384, in get
    return sq.get()
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 6807, in get
    return clone.execute(database)[0]
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 1886, in inner
    return method(self, database, *args, **kwargs)
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 1957, in execute
    return self._execute(database)
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 2129, in _execute
    cursor = database.execute(self)
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 3112, in execute
    return self.execute_sql(sql, params, commit=commit)
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 3106, in execute_sql
    self.commit()
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 2873, in __exit__
    reraise(new_type, new_type(exc_value, *exc_args), traceback)
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 183, in reraise
    raise value.with_traceback(tb)
  File "/Users/layne/.pyenv/versions/3.7.6/envs/jupyterlab/lib/python3.7/site-packages/peewee.py", line 3099, in execute_sql
    cursor.execute(sql, params or ())
peewee.OperationalError: disk I/O error

I got this working using threads instead, but it's hard to actually terminate a thread (not just break from a loop) and CPU-intensive (not io delayed) jobs should be multiprocessed.我使用线程来完成这项工作，但实际上很难终止线程（不仅仅是从循环中中断）并且 CPU 密集型（不是 io 延迟）作业应该是多处理的。

UPDATE: looking into peewee proxy http://docs.peewee-orm.com/en/latest/peewee/database.html#dynamically-defining-a-database更新：查看 peewee 代理http://docs.peewee-orm.com/en/latest/peewee/database.html#dynamically-defining-a-database

Answer 1

I believe the problem was that:我认为问题在于：

Within the separate process, I was not closing the existing connection before attempting to replace it with a new connection.在单独的过程中，在尝试用新连接替换现有连接之前，我没有关闭现有连接。

def background_proc():
    db = BaseModel._meta.database
    db.close() #<----- this
    BaseModel._meta.database = get_db()

This works and I can continue to use the original connection on my main process (or whatever the non-multiprocess called).这有效，我可以继续在我的主进程（或任何非多进程调用）上使用原始连接。

Answer 2

Maybe init DB Object in each process will help you.也许每个进程中的 init DB Object 会帮助你。

def get_db():
    db = SqliteExtDatabase(path)
    return db

class BaseModel(Model):

    def __init__(self, database, **kwargs):
        self.database = database

# Many other models

class Batch(BaseModel):
    
    def multi():
        def background_proc():
            # trying to override Meta's db connection.
            db = get_db()
            basemodel = BaseModel(db)
            # do something like "basemodel.insert(name="Alex")"
            job = Job(db)
            result = job.get_by_id(1)
            print("result")
            print("working in the background")
        
        process = multiprocessing.Process(target=background_proc)
        process.start()

peewee 多进程来自单独的数据库连接“peewee.OperationalError: 磁盘 I/O 错误”

问题描述

2 个解决方案

解决方案1
0 2020-11-06 13:20:16

解决方案2
0 2020-11-06 13:26:30

peewee 多进程来自单独的数据库连接“peewee.OperationalError: 磁盘 I/O 错误”

问题描述

2 个解决方案

解决方案1 0 2020-11-06 13:20:16

解决方案2 0 2020-11-06 13:26:30

解决方案1
0 2020-11-06 13:20:16

解决方案2
0 2020-11-06 13:26:30