Python性能：搜索大型列表与sqlite

Question

Lets say I have a database table which consists of three columns: id , field1 and field2 . 假设我有一个数据库表，它包含三列： id ， field1和field2 。 This table may have anywhere between 100 and 100,000 rows in it. 该表中可能包含100到100,000行。 I have a python script that should insert 10-1,000 new rows into this table. 我有一个python脚本，应该在此表中插入10-1,000个新行。 However, if the new field1 already exists in the table, it should do an UPDATE , not an INSERT . 但是，如果表中已存在新的field1 ，则应该执行UPDATE ，而不是INSERT 。

Which of the following approaches is more efficient? 以下哪种方法更有效？

Do a SELECT field1 FROM table ( field1 is unique) and store that in a list. SELECT field1 FROM table （ field1是唯一的）并将其存储在列表中。 Then, for each new row, use list.count() to determine whether to INSERT or UPDATE 然后，对于每个新行，使用list.count()来确定是INSERT还是UPDATE
For each row, run two queries. 对于每一行，运行两个查询。 Firstly, SELECT count(*) FROM table WHERE field1="foo" then either the INSERT or UPDATE . 首先， SELECT count(*) FROM table WHERE field1="foo"然后是INSERT或UPDATE 。

In other words, is it more efficient to perform n+1 queries and search a list, or 2n queries and get sqlite to search? 换句话说，执行n + 1个查询和搜索列表，或2n个查询并获取sqlite进行搜索是否更有效？

Answer 1

If I understand your question correctly, it seems like you could simply use SQLite's built in conflict handling mechanism. 如果我理解你的问题，似乎你可以简单地使用SQLite内置的冲突处理机制。

Assuming you have a UNIQUE constraint on field1, you could simple use: 假设你对field1有一个UNIQUE约束，你可以简单地使用：

INSERT OR REPLACE INTO table VALUES (...)

The following syntax is also supported (identical semantics): 还支持以下语法（相同的语义）：

REPLACE INTO table VALUES (...)

EDIT: I realise that I am not really answering your question, just providing an alternative solution which should be faster. 编辑：我意识到我并没有真正回答你的问题，只是提供一个应该更快的替代解决方案。

Answer 2

I'm not familiar with sqlite but a general approach like this should work: 我不熟悉sqlite，但这样的一般方法应该有效：

If there's a unique index on field1 and you're trying to insert a value that's already there you should get an error. 如果field1上有唯一索引，并且您尝试插入已存在的值，则应该收到错误。 If insert fails, you go with the update. 如果插入失败，则进行更新。

Pseudocode: 伪代码：

try
{
    insert into table (value1, value2)
}
catch(insert fails)
{
    update table set field2=value2 where field1=value1
}

Answer 3

I imagine using a python dictionary would allow for much faster searching than using a python list. 我想使用python字典可以比使用python列表更快地搜索。 (Just set the values to 0, you won't need them, and hopefully a '0' stores compactly.) （只需将值设置为0，您将不需要它们，并希望紧凑地存储'0'。）

As for the larger question, I'm curious too. 至于更大的问题，我也很好奇。 :) :)

Answer 4

You appear to be comparing apples with oranges. 您似乎在将苹果与橙子进行比较。

A python list is only useful if your data fit into the address-space of the process. 只有当您的数据适合进程的地址空间时，python列表才有用。 Once the data get big, this won't work any more. 一旦数据变大，这将不再起作用。

Moreover, a python list is not indexed - for that you should use a dictionary. 而且，python列表没有索引 - 为此您应该使用字典。

Finally, a python list is non-persistent - it is forgotten when the process quits. 最后，python列表是非持久性的 - 在进程退出时会忘记它。

How can you possibly compare these? 你怎么可能比较这些？

Python性能：搜索大型列表与sqlite

问题描述

4 个解决方案

解决方案1
9 已采纳 2010-08-04 10:32:33

解决方案2
1 2010-08-04 10:37:25

解决方案3
0 2010-08-04 10:25:22

解决方案4
0 2010-08-21 07:50:27

Python性能：搜索大型列表与sqlite

问题描述

4 个解决方案

解决方案1 9 已采纳 2010-08-04 10:32:33

解决方案2 1 2010-08-04 10:37:25

解决方案3 0 2010-08-04 10:25:22

解决方案4 0 2010-08-21 07:50:27

解决方案1
9 已采纳 2010-08-04 10:32:33

解决方案2
1 2010-08-04 10:37:25

解决方案3
0 2010-08-04 10:25:22

解决方案4
0 2010-08-21 07:50:27