简体   繁体   English

使用较新版本的System.Data.SQLite / sqlite3.dll,在SQLite数据库上使用子查询进行查询的速度大约慢10倍

[英]Querying using subqueries on a SQLite database approx 10x slower using newer versions of System.Data.SQLite/sqlite3.dll

(See update below) (请参阅下面的更新)

I am having an issue of slow query performance when querying a very simplistic Sqlite datatable of about 500,000 rows from within a C#.Net application (~5sec). 从C#.Net应用程序(约5秒)中查询大约500,000行的非常简单的Sqlite数据表时,我遇到了查询性能降低的问题。

I have tried the exact same query on exactly the same database using LinqPad, as well as 2 database browsers (both based on QtSql), and it runs 10x faster (~0.5secs). 我已经使用LinqPad以及2个数据库浏览器(均基于QtSql)在完全相同的数据库上尝试了完全相同的查询,并且运行速度提高了10倍(〜0.5秒)。 Same query, same db, different apps, only mine doesn't run fast. 相同的查询,相同的数据库,不同的应用程序,只有我的运行不快。

It makes negligible difference whether I'm returning values or just a Count(*). 无论是返回值还是返回Count(*),它的差异都可以忽略不计。

I've tried: 我试过了:

  • building for each of .Net 3.5/4/4.5 为每个.Net 3.5 / 4 / 4.5构建
  • building for each of of AnyCPU/x86/x64 为每个AnyCPU / x86 / x64构建
  • using each of System.Data.Sqlite, sqlite-net, as well as directly accessing a sqlite3 dll via COM 使用每个System.Data.Sqlite,sqlite-net以及通过COM直接访问sqlite3 dll
  • building for each of WPF/WinForms 为每个WPF / WinForms构建
  • different variations of the queries 查询的不同变体

None of these make any noticible difference to the query time. 这些都不会对查询时间产生明显的影响。

I know that rewriting the query using JOINs may help, but what I can't figure out is why the same query works fine in LinqPad/Sql browers but not from any app I try to create. 我知道使用JOIN重写查询可能会有所帮助,但是我不知道是为什么相同的查询在LinqPad / Sql浏览器中可以正常运行,但不能从我尝试创建的任何应用程序中正常运行。 I must be missing something pretty fundamental. 我一定错过了一些基本的东西。

Example Table: 示例表:

"CREATE TABLE items(id INTEGER PRIMARY KEY, id1 INTEGER, id2 INTEGER, value INTEGER)"

Example Query String (though basically any query using a subquery takes a long time): 示例查询字符串(尽管基本上任何使用子查询的查询都需要很长时间):

SELECT count(*) 
FROM items WHERE 
id2 IN 
(
    SELECT DISTINCT id2 FROM items WHERE id1 IN 
    (
        SELECT DISTINCT id1 FROM items WHERE id2 = 100000 AND value = 10
    )
    AND value = 10
) 
AND value = 10 
GROUP BY id2

I know this could probably be re-written using JOINS and indexing to speed it up, but the fact remains that this query works significantly faster from other apps. 我知道可以使用JOINS和索引对其进行重写以加快速度,但事实是此查询在其他应用程序中的运行速度明显更快。 What am I missing here as to why the same query runs so much slower no matter what I try? 无论我尝试什么,为什么同一查询的运行速度如此之慢,我在这里缺少什么?

UPDATE: It seems the version of sqlite has something to do with the issue. 更新:似乎sqlite的版本与此问题有关。 Using the legacy System.Data.Sqlite v1.0.66.0 the query runs just like the other apps, however using a more recent version it is slow. 使用旧版System.Data.Sqlite v1.0.66.0,查询的运行方式与其他应用程序相同,但是使用较新版本的查询速度很慢。 I haven't pinpointed what at what version exactly this changed, but am pretty sure it's to do with the underlying sqlite3 version not System.Data.Sqlite specifically. 我还没有确定确切的版本是什么,但是我确定它与底层的sqlite3版本有关,而不是与System.Data.Sqlite有关。 If anyone knows what could have changed that would cause subqueries to slow down so much in this situation, or if there are settings or something that can make subqueries run faster in new versions of sqlite please let me know! 如果有人知道在这种情况下可能进行了哪些更改,可能导致子查询的运行速度大大降低,或者是否有设置或某些方法可以使子查询在新版本的sqlite中运行得更快,请告诉我!

Again, the query is an example and is not ideal and partially redundant... the question is more about why it works in one and not the other. 同样,查询只是一个例子,不是理想的查询,而是部分多余的查询……问题更多的是为什么它可以在一个而不是另一个中工作。

Thanks in advance for any additional input! 预先感谢您的任何其他投入!

UPDATE: SOLVED 更新:已解决

See my answer below. 请参阅下面的答案。

Some suggestions: 一些建议:

You say you don't want to rework your queries nor add indexes. 您说您不想重做查询或添加索引。 That is the obvious thing to do here. 这是显而易见的事情。 Without any indexes sqlite has to scan your 500,000 row table at least one time (or more likely multiple times). 没有任何索引,sqlite必须扫描您的500,000行表至少一次(或更可能是多次)。

Based on your query above I would add indexes to columns id1 and id2 . 根据上面的查询,我将向索引id1id2添加索引。

One other thing is that your query above seems a little redundant. 另一件事是您上面的查询似乎有点多余。 Maybe you have your reasons, but I cannot see why the query should be so complicated. 也许您有您的原因,但我看不出为什么查询应该如此复杂。 Simplified query: 简化查询:

 
 
 
  
  select count(*) from items where id2 = 100000 and value = 10
 
  

try 尝试

SELECT ID1.id2, count(*) 
FROM items ID2
JOIN items ID1
  on ID2.id2 = ID1.id1
 and ID1.id2 = 100000 
 and ID1.value = 10 
 and ID2.valu3 = 10
group by ID1.id2

Ok turns out it was to do with Automatic Indexing, which was introduced with SQLite 1.7.0. 好的,这与SQLite 1.7.0引入的自动索引有关。 In my situation using subqueries on this kind of table without indexes meant that the time it took SQLite to create the automatic indexes was causing the additional overhead that the queries were experiencing. 在我的情况下,在这种没有索引的表上使用子查询意味着SQLite创建自动索引所花费的时间导致了查询所产生的额外开销。

The solution was to use: 解决方案是使用:

PRAGMA automatic_index=OFF;

at the start of any query that uses the "IN" clause. 在使用“ IN”子句的任何查询的开头。

Creating indexes on the columns may also solve this (untested), however in this particular situation the additional size/disk usage necessary to create the indexes is not worth it. 在列上创建索引也可以解决此问题(未试用),但是在这种特殊情况下,创建索引所需的额外大小/磁盘使用量是不值得的。

This would also suggest that the LinqPad SQLite plugin and the database viewers I was using are based on old sqlite versions. 这也表明LinqPad SQLite插件和我使用的数据库查看器均基于旧的sqlite版本。

More information can be found at: 可以在以下位置找到更多信息:

http://www.sqlite.org/src/info/8011086c85c6c4040 http://www.sqlite.org/src/info/8011086c85c6c4040

http://www.sqlite.org/optoverview.html#autoindex http://www.sqlite.org/optoverview.html#autoindex

Thanks to everyone that responded. 感谢大家的回应。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM