分叉，sqlalchemy和作用域会话

Question

I'm getting the following error (which I assume is because of the forking in my application), "This result object does not return rows". 我收到以下错误（我认为是由于应用程序中的分支），“此结果对象不返回行”。

Traceback
---------
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/dask/async.py", line 263, in execute_task
result = _execute_task(task, data)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/dask/async.py", line 245, in _execute_task
return func(*args2)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/smg/analytics/services/impact_analysis.py", line 140, in _do_impact_analysis_mp
 Correlation.user_id.in_(user_ids)).all())
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/orm/query.py", line 2241, in all
return list(self)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/orm/loading.py", line 65, in instances
fetch = cursor.fetchall()
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 752, in fetchall
self.cursor, self.context)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/base.py", line 1027, in _handle_dbapi_exception
util.reraise(*exc_info)
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 746, in fetchall
l = self.process_rows(self._fetchall_impl())
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 715, in _fetchall_impl
self._non_result()
File "/opt/miniconda/envs/analytical-engine/lib/python2.7/site-packages/sqlalchemy/engine/result.py", line 720, in _non_result
"This result object does not return rows. "

I'm using dask and it's multiprocessing scheduler (which uses multiprocessing.Pool ). 我正在使用dask及其多处理调度程序（使用multiprocessing.Pool ）。 As I understand it (based on the documentation), sessions created from a scoped session object (created via scoped_session() ), are threadsafe. 据我了解（基于文档），从有作用域的会话对象（通过scoped_session()创建scoped_session()创建的会话是线程安全的。 This is because they are threadlocal. 这是因为它们是线程局部的。 This would lead me to believe that when I call Session() (or using the proxy Session ) I'm getting a session object that only exists and is only accessible from the thread it was called from. 这会让我相信，当我调用Session() （或使用代理Session ）时，我得到的会话对象仅存在并且只能从被调用的线程中访问。 This seems pretty straight forward. 这似乎很简单。

What I am confused about, is what is why I'm having issues when forking the process. 我感到困惑的是，为什么在分叉过程时遇到问题。 I understand that you can't re-use an engine across processes, so I've followed the event-based solution verbatim from the docs and done this: 我了解您无法跨流程重复使用引擎，因此我逐字遵循了基于事件的解决方案，并做到了：

class _DB(object):

    _engine = None

    @classmethod
    def _get_engine(cls, force_new=False):
        if cls._engine is None or force_new is True:
            cfg = Config.get_config()
            user = cfg['USER']
            host = cfg['HOST']
            password = cfg['PASSWORD']
            database = cfg['DATABASE']
            engine = create_engine(
                'mysql://{}:{}@{}/{}?local_infile=1&'
                'unix_socket=/var/run/mysqld/mysqld.sock'.
                    format(user, password, host, database),
                pool_size=5, pool_recycle=3600)
            cls._engine = engine
        return cls._engine



# From the docs, handles multiprocessing
@event.listens_for(_DB._get_engine(), "connect")
def connect(dbapi_connection, connection_record):
    connection_record.info['pid'] = os.getpid()

#From the docs, handles multiprocessing
@event.listens_for(_DB._get_engine(), "checkout")
def checkout(dbapi_connection, connection_record, connection_proxy):
    pid = os.getpid()
    if connection_record.info['pid'] != pid:
        connection_record.connection = connection_proxy.connection = None
        raise exc.DisconnectionError(
            "Connection record belongs to pid %s, "
            "attempting to check out in pid %s" %
            (connection_record.info['pid'], pid)
        )


# The following is how I create the scoped session object.

Session = scoped_session(sessionmaker(
    bind=_DB._get_engine(), autocommit=False, autoflush=False))

Base = declarative_base()
Base.query = Session.query_property()

So my assumptions (based on the docs) are the following: 因此，我的假设（基于文档）如下：

Using a session object created from a scoped session object, it must always give me a threadlocal session (which in my case would just be the main thread of the child process). 使用从有作用域的会话对象创建的会话对象，它必须始终为我提供一个threadlocal会话（在我的情况下，它只是子进程的主线程）。 Although not in the docs I imagine this should apply even if the scoped session object was created in another process. 尽管不在文档中，但我想这应该适用，即使在另一个进程中创建了作用域会话对象。
The threadlocal session will get a connection from the pool via the engine, if the connection was not created within this process it will create a new one (based on the above connection() and checkout() implementations.) threadlocal会话将通过引擎从池中获得连接，如果未在此过程中创建连接，它将创建一个新connection()基于上述connection()和checkout()实现）。

If both of these things were true, then everything should "just work" (AFAICT). 如果这两个事实都成立，那么一切都应该“正常”（AFAICT）。 That's not the case though. 事实并非如此。

I managed to get it to work by creating a new scoped session object in each new process, and using it in all subsequent calls using a session. 我设法通过在每个新进程中创建一个新的作用域会话对象，然后在使用会话的所有后续调用中使用它来使其工作。

BTW the Base.query attribute needed to be updated from this new scoped session object as well. 顺便说一句， Base.query需要从这个新的作用域会话对象中更新Base.query属性。

I imagine that my #1 assumption above is incorrect. 我想我上面的＃1假设是不正确的。 Can anyone help me understand why I need to create a new scoped session object in each process? 谁能帮助我了解为什么我需要在每个过程中创建一个新的作用域会话对象？

Cheers. 干杯。

Answer 1

It is not clear when your fork happens but the most common issue is that the engine is created before the fork, which initializes a TCP connections to the database with your pool_size=5 which then gets copied over to the new processes and results in multiple processes interacting with the same physical sockets => troubles. 目前尚不清楚何时发生派生，但最常见的问题是在派生之前创建了引擎，该引擎使用pool_size = 5初始化到数据库的TCP连接，然后将其复制到新进程并导致多个进程与相同的物理套接字交互=>麻烦。

Options are to: 选项包括：

Disable the pool and use an on demand connection: poolclass= NullPool 禁用池并使用按需连接：poolclass = NullPool
Re-create the pool after fork: sqla_engine. 在派生后重新创建池：sqla_engine。 dispose() 处置（）
Delay the create_engine until after the fork 将create_engine延迟到派生之后

分叉，sqlalchemy和作用域会话

问题描述

1 个解决方案

解决方案1
0 2017-02-10 19:58:14

分叉，sqlalchemy和作用域会话

问题描述

1 个解决方案

解决方案1 0 2017-02-10 19:58:14

解决方案1
0 2017-02-10 19:58:14