简体   繁体   English

Python性能:搜索大型列表与sqlite

[英]Python performance: search large list vs sqlite

Lets say I have a database table which consists of three columns: id , field1 and field2 . 假设我有一个数据库表,它包含三列: idfield1field2 This table may have anywhere between 100 and 100,000 rows in it. 该表中可能包含100到100,000行。 I have a python script that should insert 10-1,000 new rows into this table. 我有一个python脚本,应该在此表中插入10-1,000个新行。 However, if the new field1 already exists in the table, it should do an UPDATE , not an INSERT . 但是,如果表中已存在新的field1 ,则应该执行UPDATE ,而不是INSERT

Which of the following approaches is more efficient? 以下哪种方法更有效?

  1. Do a SELECT field1 FROM table ( field1 is unique) and store that in a list. SELECT field1 FROM tablefield1是唯一的)并将其存储在列表中。 Then, for each new row, use list.count() to determine whether to INSERT or UPDATE 然后,对于每个新行,使用list.count()来确定是INSERT还是UPDATE
  2. For each row, run two queries. 对于每一行,运行两个查询。 Firstly, SELECT count(*) FROM table WHERE field1="foo" then either the INSERT or UPDATE . 首先, SELECT count(*) FROM table WHERE field1="foo"然后是INSERTUPDATE

In other words, is it more efficient to perform n+1 queries and search a list, or 2n queries and get sqlite to search? 换句话说,执行n + 1个查询和搜索列表,或2n个查询并获取sqlite进行搜索是否更有效?

If I understand your question correctly, it seems like you could simply use SQLite's built in conflict handling mechanism. 如果我理解你的问题,似乎你可以简单地使用SQLite内置的冲突处理机制。

Assuming you have a UNIQUE constraint on field1, you could simple use: 假设你对field1有一个UNIQUE约束,你可以简单地使用:

INSERT OR REPLACE INTO table VALUES (...)

The following syntax is also supported (identical semantics): 还支持以下语法(相同的语义):

REPLACE INTO table VALUES (...)

EDIT: I realise that I am not really answering your question, just providing an alternative solution which should be faster. 编辑:我意识到我并没有真正回答你的问题,只是提供一个应该更快的替代解决方案。

I'm not familiar with sqlite but a general approach like this should work: 我不熟悉sqlite,但这样的一般方法应该有效:

If there's a unique index on field1 and you're trying to insert a value that's already there you should get an error. 如果field1上有唯一索引,并且您尝试插入已存在的值,则应该收到错误。 If insert fails, you go with the update. 如果插入失败,则进行更新。

Pseudocode: 伪代码:

try
{
    insert into table (value1, value2)
}
catch(insert fails)
{
    update table set field2=value2 where field1=value1
}

I imagine using a python dictionary would allow for much faster searching than using a python list. 我想使用python字典可以比使用python列表更快地搜索。 (Just set the values to 0, you won't need them, and hopefully a '0' stores compactly.) (只需将值设置为0,您将不需要它们,并希望紧凑地存储'0'。)

As for the larger question, I'm curious too. 至于更大的问题,我也很好奇。 :) :)

You appear to be comparing apples with oranges. 您似乎在将苹果与橙子进行比较。

A python list is only useful if your data fit into the address-space of the process. 只有当您的数据适合进程的地址空间时,python列表才有用。 Once the data get big, this won't work any more. 一旦数据变大,这将不再起作用。

Moreover, a python list is not indexed - for that you should use a dictionary. 而且,python列表没有索引 - 为此您应该使用字典。

Finally, a python list is non-persistent - it is forgotten when the process quits. 最后,python列表是非持久性的 - 在进程退出时会忘记它。

How can you possibly compare these? 你怎么可能比较这些?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM