简体   繁体   English

如何在多线程Python应用程序中共享单个SQLite连接

[英]How to share single SQLite connection in multi-threaded Python application

I am trying to write a multi-threaded Python application in which a single SQlite connection is shared among threads. 我正在尝试编写一个多线程Python应用程序,其中在线程之间共享单个SQlite连接。 I am unable to get this to work. 我无法让这个工作。 The real application is a cherrypy web server, but the following simple code demonstrates my problem. 真正的应用程序是一个令人讨厌的Web服务器,但以下简单的代码演示了我的问题。

What change or changes to I need to make to run the sample code, below, successfully? 下面成功运行示例代码需要做哪些更改或更改?

When I run this program with THREAD_COUNT set to 1 it works fine and my database is updated as I expect (that is, letter "X" is added to the text value in the SectorGroup column). 当我运行此程序并将THREAD_COUNT设置为1时,它工作正常,我的数据库按照我的预期更新(即,字母“X”被添加到SectorGroup列中的文本值)。

When I run it with THREAD_COUNT set to anything higher than 1, all threads but 1 terminate prematurely with SQLite related exceptions. 当我在THREAD_COUNT设置为高于1的任何值的情况下运行它时,除1之外的所有线程都会过早地终止与SQLite相关的异常。 Different threads throw different exceptions (with no discernible pattern) including: 不同的线程抛出不同的异常(没有可辨别的模式),包括:

OperationalError: cannot start a transaction within a transaction 

(occurs on the UPDATE statement) (发生在UPDATE语句中)

OperationalError: cannot commit - no transaction is active 

(occurs on the .commit() call) (发生在.commit()调用上)

InterfaceError: Error binding parameter 0 - probably unsupported type. 

(occurs on the UPDATE and the SELECT statements) (发生在UPDATESELECT语句中)

IndexError: tuple index out of range

(this one has me completely puzzled, it occurs on the statement group = rows[0][0] or '' , but only when multiple threads are running) (这个让我完全不解,它发生在语句group = rows[0][0] or '' ,但只在多线程运行时才会出现)

Here is the code: 这是代码:

CONNECTION = sqlite3.connect('./database/mydb', detect_types=sqlite3.PARSE_DECLTYPES, check_same_thread = False)
CONNECTION.row_factory = sqlite3.Row

def commands(start_id):

    # loop over 100 records, read the SectorGroup column, and write it back with "X" appended.
    for inv_id in range(start_id, start_id + 100):

        rows = CONNECTION.execute('SELECT SectorGroup FROM Investment WHERE InvestmentID = ?;', [inv_id]).fetchall()
        if rows:
            group = rows[0][0] or ''
            msg = '{} inv {} = {}'.format(current_thread().name, inv_id, group)
            print msg
            CONNECTION.execute('UPDATE Investment SET SectorGroup = ? WHERE InvestmentID = ?;', [group + 'X', inv_id])

        CONNECTION.commit()

if __name__ == '__main__':

    THREAD_COUNT = 10

    for i in range(THREAD_COUNT):
        t = Thread(target=commands, args=(i*100,))
        t.start()

It's not safe to share a connection between threads; 在线程之间共享连接是不安全的; at the very least you need to use a lock to serialize access. 至少你需要使用一个锁来序列化访问。 Do also read http://docs.python.org/2/library/sqlite3.html#multithreading as older SQLite versions have more issues still. 还请阅读http://docs.python.org/2/library/sqlite3.html#multithreading,因为较旧的SQLite版本仍有更多问题。

The check_same_thread option appears deliberately under-documented in that respect, see http://bugs.python.org/issue16509 . check_same_thread选项在这方面似乎故意记录不足,请参阅http://bugs.python.org/issue16509

You could use a connection per thread instead, or look to SQLAlchemy for a connection pool (and a very efficient statement-of-work and queuing system to boot). 您可以使用每个线程的连接,或者查看SQLAlchemy以获取连接池(以及非常有效的工作声明和排队系统)。

I ran into the SqLite threading problem when writing a simple WSGI server for fun and learning. 在编写一个简单的WSGI服务器以进行娱乐和学习时,我遇到了SqLite线程问题。 WSGI is multi-threaded by nature when running under Apache. 在Apache下运行时,WSGI本质上是多线程的。 The following code seems to work for me: 以下代码似乎对我有用:

import sqlite3
import threading

class LockableCursor:
    def __init__ (self, cursor):
        self.cursor = cursor
        self.lock = threading.Lock ()

    def execute (self, arg0, arg1 = None):
        self.lock.acquire ()

        try:
            self.cursor.execute (arg1 if arg1 else arg0)

            if arg1:
                if arg0 == 'all':
                    result = self.cursor.fetchall ()
                elif arg0 == 'one':
                    result = self.cursor.fetchone ()
        except Exception as exception:
            raise exception

        finally:
            self.lock.release ()
            if arg1:
                return result

def dictFactory (cursor, row):
    aDict = {}
    for iField, field in enumerate (cursor.description):
        aDict [field [0]] = row [iField]
    return aDict

class Db:
    def __init__ (self, app):
        self.app = app

    def connect (self):
        self.connection = sqlite3.connect (self.app.dbFileName, check_same_thread = False, isolation_level = None)  # Will create db if nonexistent
        self.connection.row_factory = dictFactory
        self.cs = LockableCursor (self.connection.cursor ())

Example of use: 使用示例:

if not ok and self.user:    # Not logged out
    # Get role data for any later use
    userIdsRoleIds = self.cs.execute ('all', 'SELECT role_id FROM users_roles WHERE user_id == {}'.format (self.user ['id']))

    for userIdRoleId in userIdsRoleIds:
        self.userRoles.append (self.cs.execute ('one', 'SELECT name FROM roles WHERE id == {}'.format (userIdRoleId ['role_id'])))

Another example: 另一个例子:

self.cs.execute ('CREATE TABLE users (id INTEGER PRIMARY KEY, email_address, password, token)')         
self.cs.execute ('INSERT INTO users (email_address, password) VALUES ("{}", "{}")'.format (self.app.defaultUserEmailAddress, self.app.defaultUserPassword))

# Create roles table and insert default role
self.cs.execute ('CREATE TABLE roles (id INTEGER PRIMARY KEY, name)')
self.cs.execute ('INSERT INTO roles (name) VALUES ("{}")'.format (self.app.defaultRoleName))

# Create users_roles table and assign default role to default user
self.cs.execute ('CREATE TABLE users_roles (id INTEGER PRIMARY KEY, user_id, role_id)') 

defaultUserId = self.cs.execute ('one', 'SELECT id FROM users WHERE email_address = "{}"'.format (self.app.defaultUserEmailAddress)) ['id']         
defaultRoleId = self.cs.execute ('one', 'SELECT id FROM roles WHERE name = "{}"'.format (self.app.defaultRoleName)) ['id']

self.cs.execute ('INSERT INTO users_roles (user_id, role_id) VALUES ({}, {})'.format (defaultUserId, defaultRoleId))

Complete program using this construction downloadable at: http://www.josmith.org/ 使用此结构的完整程序可从以下网址下载: http//www.josmith.org/

NB The code above is experimental, there may be (fundamental) issues when using this with (many) concurrent requests (eg as part of a WSGI server). 注意上面的代码是实验性的,当与(许多)并发请求(例如作为WSGI服务器的一部分)一起使用时,可能存在(基本)问题。 Performance is not critical for my application. 性能对我的应用程序并不重要。 The simplest thing probably would have been to just use MySql, but I like to experiment a little, and the zero installation thing about SqLite appealed to me. 最简单的事情可能只是使用MySql,但我喜欢尝试一下,关于SqLite的零安装事情吸引了我。 If anyone thinks the code above is fundamentally flawed, please react, as my purpose is to learn. 如果有人认为上面的代码存在根本缺陷,请作出反应,因为我的目的是学习。 If not, I hope this is useful for others. 如果没有,我希望这对其他人有用。

I'm guessing here, but it looks like the reason why you are doing this is a performance concern. 我猜这里,但看起来你这样做的原因是性能问题。

Python threads aren't performant in any meaningful way for this use case. 对于此用例,Python线程无法以任何有意义的方式执行。 Instead, use sqlite transactions, which are super fast. 相反,使用超快速的sqlite事务。

If you do all your updates in a transaction, you'll find an order of magnitude speedup. 如果您在事务中执行所有更新,您将发现一个数量级的加速。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM