如何在Python中加快sqlite3查询？

Question

I have an sqlite table with a few hundred million rows: 我有一个包含几亿行的sqlite表：

sqlite> create table t1(id INTEGER PRIMARY KEY,stuff TEXT );

I need to query this table by its integer primary key hundreds of millions of times. 我需要通过其整数主键查询该表数亿次。 My code: 我的代码：

conn = sqlite3.connect('stuff.db')
with conn:
    cur = conn.cursor()
    for id in ids:
        try:
            cur.execute("select stuff from t1 where rowid=?",[id])
            stuff_tuple = cur.fetchone()
            #do something with the fetched row
        except:
            pass #for when id is not in t1's key set

Here, ids is a list that may have tens of thousands of elements. 在此，id是一个可能包含成千上万个元素的列表。 Forming t1 did not take very long (ie ~75K inserts per second). 形成t1并不需要很长时间（即每秒插入约75K）。 Querying t1 the way I've done it is unacceptably slow (ie ~1K queries in 10 seconds). 用我做过的方式查询t1的速度实在令人难以接受（即10秒钟内约有1K次查询）。

I am completely new to SQL. 我对SQL完全陌生。 What am I doing wrong? 我究竟做错了什么？

Answer 1

Since you're retrieving values by their keys, it seems like a key/value store would be more appropriate in this case. 由于您要通过键检索值，因此在这种情况下，键/值存储似乎更合适。 Relational databases (Sqlite included) are definitely feature-rich, but you can't beat the performance of a simple key/value store. 关系数据库（包括Sqlite）肯定具有丰富的功能，但是您无法击败简单的键/值存储的性能。

There are several to choose from: 有几种可供选择：

Redis : "advanced key-value store", very fast, optimized for in-memory operation Redis ：“高级键值存储”，非常快，针对内存操作进行了优化
Cassandra : extremely high performance, scalable, used by multiple high-profile sites Cassandra ：极高的性能，可伸缩性，供多个知名站点使用
MongoDB : feature-rich, tries to be "middle ground" between relational and NoSQL (and they've started offering free online classes ) MongoDB ：功能丰富，试图成为关系型和NoSQL之间的“中间地带”（他们已经开始提供免费的在线类）

And there's many, many more . 还有很多很多。

Answer 2

You should make one sql call instead, should be must faster 您应该改用一个sql调用，应该更快

conn = sqlite3.connect('stuff.db')
with conn:
    cur = conn.cursor()

    for row in cur.execute("SELECT stuff FROM t1 WHERE rowid IN (%s)" % ','.join('?'*len(ids)), ids):
        #do something with the fetched row
        pass

you do not need a try except since ids not in the db will not show up. 您不需要尝试，因为不会显示数据库中没有的ID。 If you want to know which ids are not in the results, you can do: 如果您想知道结果中没有哪些ID，可以执行以下操作：

ids_res = set()
for row in c.execute(...):
    ids_res.add(row['id'])
ids_not_found = ids_res.symmetric_difference(ids)

如何在Python中加快sqlite3查询？

问题描述

2 个解决方案

解决方案1
1 已采纳 2012-11-20 06:26:12

解决方案2
0 2012-10-25 04:08:47

如何在Python中加快sqlite3查询？

问题描述

2 个解决方案

解决方案1 1 已采纳 2012-11-20 06:26:12

解决方案2 0 2012-10-25 04:08:47

解决方案1
1 已采纳 2012-11-20 06:26:12

解决方案2
0 2012-10-25 04:08:47