简体   繁体   English

使用Java的大型SQL数据集查询

[英]Large SQL dataset query using java

I have the following configuration: 我有以下配置:

  • SQL Server 2008 SQL Server 2008
  • Java as backend technology - Spring + Hibernate Java作为后端技术-Spring + Hibernate

Basically what I want to do is a select with a where clause on a table. 基本上我想做的是在表上使用where子句进行选择。 The problem is the table has about 700M entries and the query takes a really long time. 问题是该表有大约700M条目,查询需要很长时间。

Can you please indicate some pointers on where to optimize the query or what sort of techniques are can I use in order to get an improvement in performance? 您能否指出一些关于优化查询的指针,或者为了提高性能我可以使用哪种技术?

Thanks. 谢谢。

Using indexes is the standard technique used to deal with this problem. 使用索引是用于解决此问题的标准技术。 As requested, here are some pointers that should get you started: 根据要求,以下一些指针可以帮助您入门:

The first thing I do in this case is isolate whether it is the amount of data I am returning that is the problem or not (an i/o issue). 在这种情况下,我要做的第一件事就是确定是否要返回的数据量是问题(I / O问题)。 A simple non-scientific way to do this is change your query to just return the count: 一种简单的非科学方法是将查询更改为仅返回计数:

select count(*) --just return a count, no data!
from MyTable
inner join MyOtherTable on ...
where ...

If this runs very quickly, it tells you your indexes are in order (assuming no sub-selects in your WHERE clause). 如果运行非常快,它会告诉您索引是正确的(假设WHERE子句中没有子选择)。 If not, then you need to work on indexes , the WHERE clause, or your query construction itself (JOINs being done, etc). 如果没有,那么您需要处理indexWHERE子句或查询构造本身(已完成JOIN等)。

Once that is satisfactory, add back in your SELECT clause. 一旦满意,请重新添加您的SELECT子句。 If it is slow, you are going to have to look at your data access pattern: 如果速度很慢,则必须查看数据访问模式:

  • Can you return fewer columns? 您可以返回更少的列吗?
  • Can you return fewer rows at once? 您可以一次返回较少的行吗?
  • Is there caching you can do in the application layer? 您可以在应用程序层中进行缓存吗?
  • Is this query a candidate for partitioned/materialized views (if your database supports those)? 该查询是否适合分区视图/实例化视图(如果您的数据库支持的话)?

I would run Profiler to find the exact query that is being generated. 我将运行Profiler来查找正在生成的确切查询。 ORMs can create less than optimal queries. ORM创建的查询少于最佳查询。 Once you know the query, you can run it in SSMS and see the execution plan. 知道查询后,您可以在SSMS中运行它并查看执行计划。 This will give you clues as to where you have performance problems. 这将为您提供有关性能问题的线索。

Several things that can cause performance problems: 可能导致性能问题的几件事:

  • Lack of correct indexing (Foreign keys should be indexed if you have joins as well as the criteria in the where clause) 缺少正确的索引编制(如果您有联接以及where子句中的条件,则应为外键编制索引)
  • Lack of sargability in the where clause forcing the query to not use existing indexes where子句中的可保留性不足,迫使查询不使用现有索引
  • Returning more columns than are needed 返回的列超出了所需
  • Correlated subqueries and scalar functions that cause row-by-agonzing-row operations 相关的子查询和标量函数,导致逐行的操作
  • Returning too much data (will anybody really be looking at 1 million records returned? You only want to return the amount you show on page not the whole possible recordset) 返回太多数据(有人真的会查看返回的100万条记录吗?您只想返回在页面上显示的数量,而不是整个可能的记录集)
  • Locking and blocking 锁定和阻止

There's more (After all whole very long books are written o nthis subject) but that should be enough to get you started at where to look. 还有更多的东西(毕竟在这个主题上写了很长的书之后),但这应该足以使您开始寻找什么。

You should provide some indexes for those column you often use to restrict the result. 您应该为那些经常用来限制结果的列提供一些索引。 Other thing is the pagination of the result set. 另一件事是结果集的分页。

Regardless of the specific DB, I would do the following: 无论使用哪个特定的数据库,我都将执行以下操作:

  1. run an explain analyze 进行解释分析
  2. make sure you have an index for the columns that are part of your where clause 确保您的where子句中的列具有索引
  3. If indexes are ok, it's very likely that you are fetching a lot of records from disk, which is very slow: if you really cannot refine your query so that you fetch fewer records, consider clustering your table, to improve disk locality of your records. 如果索引正确,则很可能是您从磁盘中获取了很多记录,这非常慢:如果您确实无法优化查询以获取更少的记录,请考虑对表进行集群化,以改善记录的磁盘位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM