简体   繁体   English

提高大型数据集的搜索性能

[英]Improving search performance in large data sets

On a WPF application already in production, users have a window where they choose a client. 在已经投入生产的WPF应用程序中,用户有一个窗口,用于选择客户端。 It shows a list with all the clients and a TextBox where they can search for a client. 它显示包含所有客户端的列表和可以搜索客户端的TextBox。

As the client base increased, this turns out to be exceptionally slow. 随着客户群的增加,这种速度异常缓慢。 Around 1 minute for a operation that happens around 100 times each day. 大约1分钟的操作,每天发生约100次。

Currently MSSQL management studio says the query select id, name, birth_date from client takes 41 seconds to execute (around 130000 rows). 目前MSSQL管理工作室表示查询select id, name, birth_date from client需要41秒才能执行(大约130000行)。

Is there any suggestions on how to improve this time? 关于如何改善这段时间有什么建议吗? Indexes, ORMs or direct sql queries on code? 代码上的索引,ORM或直接sql查询?

Currently I'm using framework 3.5 and LinqToSql 目前我正在使用框架3.5和LinqToSql

If your query is actually SELECT id, name, birth_date from client (ie, no where clause) there is very little that you'll be able to do to speed that up short of new hardware. 如果您的查询实际上是SELECT id, name, birth_date from client (即,没有where子句),那么您将能够做的很少,以加快新硬件的速度。 SQL Server will have to do a table scan to get all of the data. SQL Server必须进行表扫描才能获得所有数据。 Even a covering index means that it will have to scan an index just as big as the table. 即使覆盖索引也意味着它必须扫描与表一样大的索引。

What you need to ask yourself is: is a list of 130000 clients really useful for your users? 您需要问自己的是:130000个客户列表对您的用户真正有用吗? I anybody really going to scroll through to the 75613th entry in a list to find the user that they want? 我真的要滚动到列表中的第75613条,找到他们想要的用户吗? The answer is probably not. 答案可能不是。 I would go with the search option only. 我只会选择搜索选项。 At least then you can add indices that make sense for those queries. 至少那时你可以添加对这些查询有意义的索引。

If you absolutely do need the entire list, try loading it lazily in chunks. 如果您确实需要整个列表,请尝试以块的形式加载它。 Start with the first 500 records and then add more records as the user moves the scroll bar. 从前500条记录开始,然后在用户移动滚动条时添加更多记录。 That way the initial load time is reduced and the user will only load the data that is necessary. 这样,初始加载时间减少,用户只加载必要的数据。

Why do you need the list of all the clients? 为什么需要所有客户的清单? Couldn't you just have the search TextBox that you describe and handle the search query on the server side. 难道你不能拥有你描述的搜索TextBox并在服务器端处理搜索查询。 There you set a cap on the maximum number of returned rows for an individual client search (eg max 500 matches). 在那里,您为单个客户端搜索的最大返回行数设置上限(例如,最多500个匹配)。

Alternatively, some efficiency gains may be achived by caching the client data list on the web server 或者,可以通过在Web服务器上缓存客户端数据列表来获得一些效率增益

Indexing should not help, based on your query. 根据您的查询,索引应该没有帮助。 You could use a view which caches the sorted query (assuming you're not ordering by the id?), but given SQL Server's baked-in query cache for adhoc queries you're probably not going to see much gain there either. 您可以使用一个缓存已排序查询的视图(假设您没有按ID排序?),但鉴于SQL Server的adhoc查询的烘焙查询缓存,您可能不会在那里看到太大的收益。 The ORM does add some overhead, but there are several tutorials out there for cutting the cost of that (eg http://www.sidarok.com/web/blog/content/2008/05/02/10-tips-to-improve-your-linq-to-sql-application-performance.html ). ORM确实增加了一些开销,但是有一些教程可以降低成本(例如http://www.sidarok.com/web/blog/content/2008/05/02/10-tips-to-改进你的linq-to-sql-application-performance.html )。 Main points there that apply to you are to use compiled queries wherever possible, and turn off optimistic concurrency for read-only data. 适用于您的要点是尽可能使用已编译的查询,并为只读数据关闭乐观并发。

An even bigger performance gain could be realized by having your clients not hit the db directly. 通过让您的客户端直接访问数据库,可以实现更大的性能提升。 If you add a service layer in there (not necessarily a web service, but it could be) then the service class or application could put some smart caching in place, which would help by an order of magnitude for read-only queries like this. 如果你在那里添加一个服务层(不一定是web服务,但它可能是),那么服务类或应用程序可以放置一些智能缓存,这对于像这样的只读查询有一个数量级的帮助。

Go in to SQL Server, do a new query. 进入SQL Server,进行新的查询。 In the Query menu click the "Include Client Statistics". 在“查询”菜单中,单击“包括客户端统计信息”。

Run the query just as you would from code. 像在代码中一样运行查询。 It will display the results and also a tab next to the result called "Client Statistics" 它将显示结果以及名为“客户端统计信息”的结果旁边的选项卡

Click that and look at the time in the "Wait time on server replies" This is in ms, and it's the time the server was actually executing. 单击它并在“服务器回复的等待时间”中查看时间。这是以ms为单位,它是服务器实际执行的时间。

I just ran this query: 我刚刚运行了这个查询:

select  firstname, lastname from leads

It took 3ms on the server to fetch 301,000 records. 在服务器上花了3ms来获取301,000条记录。

The "Total Execution Time" was something like 483ms, which includes the time for SSMS to actually get the data and process it. “总执行时间”类似于483ms,其中包括SSMS实际获取数据并处理数据的时间。 My query took something like 2.5-3s to run in SSMS and the remaining time (2500ms or so) was actually for SSMS to paint the results etc.) 我的查询需要2.5-3s才能在SSMS中运行,剩余时间(2500ms左右)实际上是SSMS绘制结果等。)

My guess is, the 41 seconds is probably not being spent on the SQL server, as 130,000 records really isn't that much. 我的猜测是,41秒可能没有花在SQL服务器上,因为130,000条记录确实没那么多。 Your 41 seconds is probably largely being spent by everything after the SQL server returns the results. 在SQL Server返回结果之后,41秒可能主要用于所有内容。

If you find out SQL Server is taking a long time to execute, in the query menu turn on "Include Actual Execution Plan" Rerun your query. 如果发现SQL Server需要很长时间才能执行,请在查询菜单中启用“包括实际执行计划”,然后重新运行查询。 A new tab appears called "Execution Plan" this tab will show you what SQL server is doing when you do a select on this table as well as a percentage of where it spends all of it's time. 出现一个名为“执行计划”的新选项卡,此选项卡将显示当您对此表执行选择时SQL服务器正在执行的操作以及它花费所有时间的百分比。 In my case it spent 100% of the time in a "Clustered Index Scan" of PK_Leads 在我的情况下,它在PK_Leads的“聚集索引扫描”中花费了100%的时间

Edited to include more stats 编辑包含更多统计数据

In general: 一般来说:

  1. Find out what takes so much time, executing the query or retrieving the results 找出需要花费大量时间,执行查询或检索结果的内容
  2. If its the query execution, the query plan will tell you which indexes are missing, just press the display query plan button in the SSMS and you will get hints on which indexes you should create to increase performance 如果查询执行,查询计划将告诉您哪些索引丢失,只需按下SSMS中的显示查询计划按钮,您将获得有关应创建哪些索引以提高性能的提示
  3. If its the retrieval of the values, there is not much you can do about it besides upgrading hardware (ram, disk, network etc.) 如果它是值的检索,除了升级硬件(ram,磁盘,网络等)之外,你无能为力。

But: 但:
In your case it looks like the query is a full table scan , which is never good for performance, check if you really need to retrieve all this data at once. 在您的情况下,看起来查询是全表扫描 ,这对性能来说永远不会有好处,请检查您是否真的需要一次检索所有这些数据。
Since there are no clauses what so ever its unlikely that its the query execution that is the problem. 因为没有条款,所以它的查询执行不太可能是问题。 Meaning additional indexes will not help. 意味着其他索引无济于事。

You will need to change the way the application access the data. 您需要更改应用程序访问数据的方式。 Instead of loading all clients into memory and then search from them in memory you will need to pass on the search term to the database query. 不是将所有客户端加载到内存中,然后在内存中从它们中搜索,而是需要将搜索项传递给数据库查询。

LinqToSql enable you to use different features for searching values, here is a blog describing most of them: http://davidhayden.com/blog/dave/archive/2007/11/23/LINQToSQLLIKEOperatorGeneratingLIKESQLServer.aspx LinqToSql使您可以使用不同的功能来搜索值,这是一个描述其中大部分的博客: http//davidhayden.com/blog/dave/archive/2007/11/23/LINQToSQLLIKEOperatorGeneratingLIKESQLServer.aspx

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM