Python sqlite3 和并发

Question

I have a Python program that uses the "threading" module.我有一个使用“线程”模块的 Python 程序。 Once every second, my program starts a new thread that fetches some data from the web, and stores this data to my hard drive.每一秒，我的程序都会启动一个新线程，从 web 中获取一些数据，并将这些数据存储到我的硬盘中。 I would like to use sqlite3 to store these results, but I can't get it to work.我想使用 sqlite3 来存储这些结果，但我无法让它工作。 The issue seems to be about the following line:问题似乎与以下行有关：

conn = sqlite3.connect("mydatabase.db")

If I put this line of code inside each thread, I get an OperationalError telling me that the database file is locked.如果我将这行代码放在每个线程中，我会得到一个 OperationalError 告诉我数据库文件已锁定。 I guess this means that another thread has mydatabase.db open through a sqlite3 connection and has locked it.我想这意味着另一个线程通过 sqlite3 连接打开了 mydatabase.db 并将其锁定。
If I put this line of code in the main program and pass the connection object (conn) to each thread, I get a ProgrammingError, saying that SQLite objects created in a thread can only be used in that same thread.如果我将这行代码放在主程序中并将连接 object (conn) 传递给每个线程，我会得到一个 ProgrammingError，说在一个线程中创建的 SQLite 对象只能在同一个线程中使用。

Previously I was storing all my results in CSV files, and did not have any of these file-locking issues.以前我将所有结果存储在 CSV 个文件中，并且没有任何这些文件锁定问题。 Hopefully this will be possible with sqlite. Any ideas?希望这可以通过 sqlite 实现。有什么想法吗？

Answer 1

Contrary to popular belief, newer versions of sqlite3 do support access from multiple threads.与流行的看法相反，较新版本的 sqlite3确实支持从多个线程进行访问。

This can be enabled via optional keyword argument check_same_thread :这可以通过可选的关键字参数check_same_thread启用：

sqlite.connect(":memory:", check_same_thread=False)

Answer 2

You can use consumer-producer pattern.您可以使用消费者-生产者模式。 For example you can create queue that is shared between threads.例如，您可以创建线程之间共享的队列。 First thread that fetches data from the web enqueues this data in the shared queue.从 Web 获取数据的第一个线程将这些数据排入共享队列。 Another thread that owns database connection dequeues data from the queue and passes it to the database.另一个拥有数据库连接的线程从队列中取出数据并将其传递给数据库。

Answer 3

The following found on mail.python.org.pipermail.1239789<\/a>在mail.python.org.pipermail.1239789<\/a>上找到以下内容

I have found the solution.我找到了解决方案。 I don't know why python documentation has not a single word about this option.我不知道为什么 python 文档没有关于这个选项的一个字。 So we have to add a new keyword argument to connection function and we will be able to create cursors out of it in different thread.所以我们必须向连接函数添加一个新的关键字参数，我们将能够在不同的线程中创建游标。 So use:所以使用：

sqlite.connect(":memory:", check_same_thread = False)

Answer 4

Switch to multiprocessing<\/a> .切换到多处理<\/a>。 It is much better, scales well, can go beyond the use of multiple cores by using multiple CPUs, and the interface is the same as using python threading module.它要好得多，可扩展性好，可以通过使用多个CPU来超越使用多核，并且接口与使用python线程模块相同。

Or, as Ali suggested, just use SQLAlchemy's thread pooling mechanism<\/a> .或者，正如阿里建议的那样，只使用SQLAlchemy 的线程池机制<\/a>。 It will handle everything for you automatically and has many extra features, just to quote some of them:它将自动为您处理所有事情，并具有许多额外的功能，仅引用其中一些：

Answer 5

You shouldn't be using threads at all for this.您根本不应该为此使用线程。 This is a trivial task for twisted<\/a> and that would likely take you significantly further anyway.这对twisted<\/a>来说是一项微不足道的任务，无论如何这可能会让你走得更远。

Use only one thread, and have the completion of the request trigger an event to do the write.只使用一个线程，并让请求完成触发一个事件来进行写入。

twisted will take care of the scheduling, callbacks, etc... for you. twisted 会为您处理日程安排、回调等。 It'll hand you the entire result as a string, or you can run it through a stream-processor (I have a twitter API<\/a> and a friendfeed API<\/a> that both fire off events to callers as results are still being downloaded).它会将整个结果作为字符串传递给您，或者您可以通过流处理器运行它（我有一个twitter API<\/a>和一个 friendfeed API<\/a> ，它们都会在结果仍在下载时向调用者触发事件）。

Depending on what you're doing with your data, you could just dump the full result into sqlite as it's complete, cook it and dump it, or cook it while it's being read and dump it at the end.根据您对数据的处理方式，您可以将完整的结果转储到 sqlite 中，在完成时将其转储并转储，或者在读取数据的同时将其转储并在最后转储。

I have a very simple application that does something close to what you're wanting on github.我有一个非常简单的应用程序，它做的事情接近你在 github 上想要的。 I call it pfetch<\/a> (parallel fetch).我称之为pfetch<\/a> （并行获取）。 It grabs various pages on a schedule, streams the results to a file, and optionally runs a script upon successful completion of each one.它按计划抓取各种页面，将结果流式传输到文件，并在成功完成每个页面后可选择运行脚本。 It also does some fancy stuff like conditional GETs, but still could be a good base for whatever you're doing.它还做了一些花哨的东西，比如条件 GET，但仍然可以作为你正在做的任何事情的一个很好的基础。

"

Answer 6

Or if you are lazy, like me, you can use SQLAlchemy<\/a> .或者如果你像我一样懒惰，你可以使用SQLAlchemy<\/a> 。 It will handle the threading for you, ( using thread local, and some connection pooling<\/a> ) and the way it does it is even configurable<\/a> .它将为您处理线程（ 使用线程本地和一些连接池<\/a>），它的处理方式甚至是可配置<\/a>的。

For added bonus, if\/when you realise\/decide that using Sqlite for any concurrent application is going to be a disaster, you won't have to change your code to use MySQL, or Postgres, or anything else.为了额外的好处，如果\/当您意识到\/决定将 Sqlite 用于任何并发应用程序将是一场灾难，您将不必更改代码以使用 MySQL、Postgres 或其他任何东西。 You can just switch over.你可以直接切换。

"

Answer 7

您需要在对数据库的每个事务<\/em>之后使用session.close()<\/code>以便在同一个线程中使用相同的游标，而不是在导致此错误的多线程中使用相同的游标。

"

Answer 8

使用threading.Lock()

Answer 9

I could not find any benchmarks in any of the above answers so I wrote a test to benchmark everything.我在上述任何答案中都找不到任何基准，因此我编写了一个测试来对所有内容进行基准测试。

I tried 3 approaches我尝试了 3 种方法

Reading and writing sequentially from the SQLite database从 SQLite 数据库顺序读取和写入
Using a ThreadPoolExecutor to read/write使用 ThreadPoolExecutor 读/写
Using a ProcessPoolExecutor to read/write使用 ProcessPoolExecutor 读/写

The results and takeaways from the benchmark are as follows基准测试的结果和要点如下

Sequential reads/sequential writes work the best顺序读取/顺序写入效果最好
If you must process in parallel, use the ProcessPoolExecutor to read in parallel如果一定要并行处理，使用ProcessPoolExecutor并行读取
Do not perform any writes either using the ThreadPoolExecutor or using the ProcessPoolExecutor as you will run into database locked errors and you will have to retry inserting the chunk again不要使用 ThreadPoolExecutor 或 ProcessPoolExecutor 执行任何写入，因为您将遇到数据库锁定错误，您将不得不再次重试插入块

You can find the code and complete solution for the benchmarks in my SO answer HERE Hope that helps!您可以在我的 SO 答案中找到基准测试的代码和完整解决方案，希望对您有所帮助！

Answer 10

I would take a look at the y_serial Python module for data persistence: http://yserial.sourceforge.net我会看一下用于数据持久性的 y_serial Python 模块：http: //yserial.sourceforge.net

which handles deadlock issues surrounding a single SQLite database.它处理围绕单个 SQLite 数据库的死锁问题。 If demand on concurrency gets heavy one can easily set up the class Farm of many databases to diffuse the load over stochastic time.如果对并发性的需求变得很重，可以轻松地设置许多数据库的类 Farm 来分散随机时间的负载。

Hope this helps your project... it should be simple enough to implement in 10 minutes.希望这对您的项目有所帮助......它应该足够简单，可以在 10 分钟内实施。

Answer 11

I like Evgeny's answer - Queues are generally the best way to implement inter-thread communication.我喜欢 Evgeny 的回答 - 队列通常是实现线程间通信的最佳方式。 For completeness, here are some other options:为了完整起见，这里有一些其他选项：

Close the DB connection when the spawned threads have finished using it.当生成的线程完成使用它时关闭数据库连接。 This would fix your OperationalError<\/code> , but opening and closing connections like this is generally a No-No, due to performance overhead.这将修复您的OperationalError<\/code> ，但是由于性能开销，像这样打开和关闭连接通常是不可以的。<\/li>
Don't use child threads.不要使用子线程。 If the once-per-second task is reasonably lightweight, you could get away with doing the fetch and store, then sleeping until the right moment.如果每秒一次的任务是相当轻量级的，你可以做 fetch 和 store，然后睡到合适的时刻。 This is undesirable as fetch and store operations could take >1sec, and you lose the benefit of multiplexed resources you have with a multi-threaded approach.这是不可取的，因为获取和存储操作可能需要超过 1 秒的时间，并且您会失去使用多线程方法所拥有的多路复用资源的好处。<\/li><\/ul>"

Answer 12

You need to design the concurrency for your program.您需要为您的程序设计并发性。 SQLite has clear limitations and you need to obey them, see the FAQ<\/a> (also the following question). SQLite 有明确的限制，您需要遵守它们，请参阅常见问题解答<\/a>（也是以下问题）。

"

Answer 13

Scrapy<\/a> seems like a potential answer to my question. Scrapy<\/a>似乎是我问题的潜在答案。 Its home page describes my exact task.它的主页描述了我的确切任务。 (Though I'm not sure how stable the code is yet.) （虽然我不确定代码有多稳定。）

"

Answer 14

Please consider checking the value of THREADSAFE for the pragma_compile_options of your SQLite installation.请考虑检查 SQLite 安装的pragma_compile_options的THREADSAFE值。 For instance, with例如，与

SELECT * FROM pragma_compile_options;

If THREADSAFE is equal to 1, then your SQLite installation is threadsafe, and all you gotta do to avoid the threading exception is to create the Python connection with checksamethread equal to False .如果THREADSAFE等于 1，那么您的 SQLite 安装是线程安全的，您为避免线程异常所要做的就是创建 Python 连接， checksamethread等于False 。 In your case, it means在你的情况下，这意味着

conn = sqlite3.connect("mydatabase.db", checksamethread=False)

That's explained in some detail in Python, SQLite, and thread safety这在Python、SQLite 和线程安全中有详细解释

Answer 15

The most likely reason you get errors with locked databases is that you must issue您收到锁定数据库错误的最可能原因是您必须发出

conn.commit()

Python sqlite3 和并发

问题描述

15 个解决方案

解决方案1
193 2010-05-24 05:03:46

解决方案2
46 已采纳 2008-12-26 07:10:39

解决方案3
19 2010-04-05 12:41:29

解决方案4
15 2008-12-26 16:51:24

解决方案5
11 2008-12-26 21:59:14

解决方案6
7 2008-12-26 18:31:37

解决方案7
3 2018-07-18 15:24:57

解决方案8
1 2010-08-19 18:49:37

解决方案9
1 2018-05-01 07:17:50

解决方案10
0 2009-11-28 20:39:23

解决方案11
0 2008-12-26 12:51:53

解决方案12
0 2008-12-26 19:32:37

解决方案13
0 2008-12-28 05:55:27

解决方案14
0 2023-01-31 17:53:27

解决方案15
-1 2013-11-08 13:33:32

Python sqlite3 和并发

问题描述

15 个解决方案

解决方案1 193 2010-05-24 05:03:46

解决方案2 46 已采纳 2008-12-26 07:10:39

解决方案3 19 2010-04-05 12:41:29

解决方案4 15 2008-12-26 16:51:24

解决方案5 11 2008-12-26 21:59:14

解决方案6 7 2008-12-26 18:31:37

解决方案7 3 2018-07-18 15:24:57

解决方案8 1 2010-08-19 18:49:37

解决方案9 1 2018-05-01 07:17:50

解决方案10 0 2009-11-28 20:39:23

解决方案11 0 2008-12-26 12:51:53

解决方案12 0 2008-12-26 19:32:37

解决方案13 0 2008-12-28 05:55:27

解决方案14 0 2023-01-31 17:53:27

解决方案15 -1 2013-11-08 13:33:32

解决方案1
193 2010-05-24 05:03:46

解决方案2
46 已采纳 2008-12-26 07:10:39

解决方案3
19 2010-04-05 12:41:29

解决方案4
15 2008-12-26 16:51:24

解决方案5
11 2008-12-26 21:59:14

解决方案6
7 2008-12-26 18:31:37

解决方案7
3 2018-07-18 15:24:57

解决方案8
1 2010-08-19 18:49:37

解决方案9
1 2018-05-01 07:17:50

解决方案10
0 2009-11-28 20:39:23

解决方案11
0 2008-12-26 12:51:53

解决方案12
0 2008-12-26 19:32:37

解决方案13
0 2008-12-28 05:55:27

解决方案14
0 2023-01-31 17:53:27

解决方案15
-1 2013-11-08 13:33:32