简体   繁体   English

为存储在 SQL Server 中的数据创建内存只读缓存

[英]Create an in-memory readonly cache for data stored in SQL Server

I have a problem concerning application performance: I have many tables, each having millions of records.我有一个关于应用程序性能的问题:我有很多表,每个表都有数百万条记录。 I am performing select statements over them using joins, where clauses and orderby on different criterias (specified by the user at runtime).我正在使用连接、where 子句和 orderby 根据不同的标准(由用户在运行时指定)对它们执行选择语句。 I want to get my records paged but no matter what I do with my SQL statements I cannot reach the performance of getting my pages directly from memory.我想让我的记录分页,但无论我如何处理我的 SQL 语句,我都无法达到直接从内存中获取我的页面的性能。 Basically the problem comes when I have to filter my records by using some runtime dynamic specified criteria.基本上,当我必须使用一些运行时动态指定的标准来过滤我的记录时,问题就出现了。 I tried everything such as using ROW_NUMBER() function combined with a "where RowNo between" clause, I've tried CTE, temp tables, etc. Those SQL solutions performs well only if I don't include filtering.我尝试了所有方法,例如将 ROW_NUMBER() 函数与“where RowNo between”子句结合使用,我尝试过 CTE、临时表等。这些 SQL 解决方案只有在我不包括过滤时才能很好地执行。 Keep in mind also that I want my solution to be as generic as possible (imagine that i have in my app several lists that virtually presents paged millions of records and those records are constructed with very complex sql statements).还要记住,我希望我的解决方案尽可能通用(想象一下,我的应用程序中有几个列表,这些列表实际上显示了数百万分页记录,这些记录是用非常复杂的 sql 语句构建的)。

All my tables has a primary key of type INT.我所有的表都有一个 INT 类型的主键。

So, I come with an ideea: Why not create a "server" only for select statements.所以,我有一个想法:为什么不只为 select 语句创建一个“服务器”。 The server loads first all records from all tables and stores them into some HashSets where each T has an Id property and GetHashCode () returns that Id and also the Equals is implemented such that two records are "equal" only if Id is equal (don't scream, You will see later why I am not using all record data for hashing and comparisons).服务器首先加载所有表中的所有记录,并将它们存储到一些 HashSets 中,其中每个 T 都有一个 Id 属性,GetHashCode() 返回该 Id,并且实现了 Equals,只有当 Id 相等时,两个记录才“相等”(don不要尖叫,稍后您将看到为什么我没有使用所有记录数据进行散列和比较)。

So far so good, but there's a problem: How can I sync my in memory collections with database records?.到目前为止一切顺利,但有一个问题:如何将我的内存集合与数据库记录同步?。 The ideea is that I must find a solution such as I load only differential changes.想法是我必须找到一个解决方案,例如我只加载差异变化。 So I invented a changelog table for each table that I want to cache.因此,我为要缓存的每个表创建了一个变更日志表。 In this changelog I perform only inserts that marks dirty rows (updates or deletes) and also records newly inserted ids, all of this mechanism implemented using triggers.在这个变更日志中,我只执行标记脏行(更新或删除)并记录新插入的 id 的插入,所有这些机制都使用触发器实现。 So whenever an in-memory select comes, I check first if I must sync something (by interogating the changelog).因此,每当内存中选择出现时,我首先检查是否必须同步某些内容(通过查询更改日志)。 If something must be applied, I load the changelog, I apply those changes in memory and finally I am clearing that changelog (or maybe remember what was the highest changelog id that I've applied ...).如果必须应用某些内容,我会加载更改日志,在内存中应用这些更改,最后清除该更改日志(或者可能记得我应用的最高更改日志 ID 是什么...)。

In order to be able to apply the changelog in O ( N ) where N is the changelog size, i am using this algo:为了能够在 O ( N ) 中应用变更日志,其中 N 是变更日志大小,我使用了这个算法:

for each log.
identify my in-memory Dictionary <int, T> where the key is the primary key.

if it's a delete log then call dictionary.Remove (id) ( O ( 1 ))
if it's an update log, then call also dictionary.Remove (id) ( O (1))  and move this id into an "to be inserted" collection
if it's an insert log, move this id into a  "to be inserted" collection.

 finally, refresh cache by selecting all data from the corresponding table where Id in ("to be inserted").

For filtering, I am compiling some expression trees into Func < T, List < FilterCriterias >, bool > functors.为了过滤,我将一些表达式树编译成 Func < T, List < FilterCriterias >, bool > functors。 Using this mechanism I am performing way more faster than SQL.使用这种机制,我的执行速度比 SQL 快得多。

I Know that SQL 2012 has caching support and the new comming SQL version will suport even more but My client have SQL server 2005 so ... I can't benefit of this stuff.我知道 SQL 2012 有缓存支持,新的 SQL 版本将支持更多,但我的客户有 SQL server 2005 所以......我无法从这些东西中受益。

My question: What do you think ?我的问题:你怎么看? this is a bad ideea ?这是个坏主意吗? there's a better aproach ?有更好的方法吗?

The developers of SQL Server did a very good job. SQL Server 的开发人员做得非常好。 I think it is fairly impossible to trick this out.我认为要解决这个问题是相当不可能的。

Unless your data has some kind of implicit structure which might help to speed things up and which the optimizer cannot be aware of, such "I do my own speedy trick" approaches won't help - normally...除非您的数据具有某种可能有助于加快速度并且优化器无法意识到的隐式结构,否则这种“我自己的快速技巧”方法将无济于事-通常......

Performance problems are ever first to be solved where they occur:性能问题首先在出现的地方得到解决:

  1. the tables structures and relations表结构和关系
  2. indexes and statistics索引和统计
  3. quality of SQL statements SQL语句的质量

Even many million rows are no problem if the design and the queries are good...如果设计和查询很好,即使是数百万行也没有问题......

If your queries do a lot of computations, or you need to retrieve data out of tricky structures (nested list with recursive reads, XML...) I'd go the Data-Warehouse-Path and write some denormalized tables for quick selects.如果您的查询进行大量计算,或者您需要从棘手的结构中检索数据(具有递归读取的嵌套列表、XML...),我会使用 Data-Warehouse-Path 并编写一些非规范化表以进行快速选择。 Of course you will have to deal with the fact, that you are reading "old" data.当然,您将不得不面对这样一个事实,即您正在读取“旧”数据。 If your data does not change much, you could trigger all changes to a denormalized structure immediately.如果您的数据变化不大,您可以立即触发对非规范化结构的所有更改。 But this depends on your actual situation.但这取决于您的实际情况。

If you want, you could post one of your imperformant queries together with the relevant structure details and ask for review.如果您愿意,您可以发布您的一个低效查询以及相关的结构详细信息,并要求进行审核。 There are dedicated groups on Stack-Exchange, such as "Code Review". Stack-Exchange 上有专门的小组,例如“代码审查”。 If it's not to big, you might try it here as well...如果不是很大,你也可以在这里尝试一下......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM