[英]Python performance: search large list vs sqlite
Lets say I have a database table which consists of three columns: id
, field1
and field2
. 假设我有一个数据库表,它包含三列:
id
, field1
和field2
。 This table may have anywhere between 100 and 100,000 rows in it. 该表中可能包含100到100,000行。 I have a python script that should insert 10-1,000 new rows into this table.
我有一个python脚本,应该在此表中插入10-1,000个新行。 However, if the new
field1
already exists in the table, it should do an UPDATE
, not an INSERT
. 但是,如果表中已存在新的
field1
,则应该执行UPDATE
,而不是INSERT
。
Which of the following approaches is more efficient? 以下哪种方法更有效?
SELECT field1 FROM table
( field1
is unique) and store that in a list. SELECT field1 FROM table
( field1
是唯一的)并将其存储在列表中。 Then, for each new row, use list.count()
to determine whether to INSERT
or UPDATE
list.count()
来确定是INSERT
还是UPDATE
SELECT count(*) FROM table WHERE field1="foo"
then either the INSERT
or UPDATE
. SELECT count(*) FROM table WHERE field1="foo"
然后是INSERT
或UPDATE
。 In other words, is it more efficient to perform n+1 queries and search a list, or 2n queries and get sqlite to search? 换句话说,执行n + 1个查询和搜索列表,或2n个查询并获取sqlite进行搜索是否更有效?
If I understand your question correctly, it seems like you could simply use SQLite's built in conflict handling mechanism. 如果我理解你的问题,似乎你可以简单地使用SQLite内置的冲突处理机制。
Assuming you have a UNIQUE constraint on field1, you could simple use: 假设你对field1有一个UNIQUE约束,你可以简单地使用:
INSERT OR REPLACE INTO table VALUES (...)
The following syntax is also supported (identical semantics): 还支持以下语法(相同的语义):
REPLACE INTO table VALUES (...)
EDIT: I realise that I am not really answering your question, just providing an alternative solution which should be faster. 编辑:我意识到我并没有真正回答你的问题,只是提供一个应该更快的替代解决方案。
I'm not familiar with sqlite but a general approach like this should work: 我不熟悉sqlite,但这样的一般方法应该有效:
If there's a unique index on field1
and you're trying to insert a value that's already there you should get an error. 如果
field1
上有唯一索引,并且您尝试插入已存在的值,则应该收到错误。 If insert fails, you go with the update. 如果插入失败,则进行更新。
Pseudocode: 伪代码:
try
{
insert into table (value1, value2)
}
catch(insert fails)
{
update table set field2=value2 where field1=value1
}
I imagine using a python dictionary would allow for much faster searching than using a python list. 我想使用python字典可以比使用python列表更快地搜索。 (Just set the values to 0, you won't need them, and hopefully a '0' stores compactly.)
(只需将值设置为0,您将不需要它们,并希望紧凑地存储'0'。)
As for the larger question, I'm curious too. 至于更大的问题,我也很好奇。 :)
:)
You appear to be comparing apples with oranges. 您似乎在将苹果与橙子进行比较。
A python list is only useful if your data fit into the address-space of the process. 只有当您的数据适合进程的地址空间时,python列表才有用。 Once the data get big, this won't work any more.
一旦数据变大,这将不再起作用。
Moreover, a python list is not indexed - for that you should use a dictionary. 而且,python列表没有索引 - 为此您应该使用字典。
Finally, a python list is non-persistent - it is forgotten when the process quits. 最后,python列表是非持久性的 - 在进程退出时会忘记它。
How can you possibly compare these? 你怎么可能比较这些?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.