简体   繁体   English

表格中有400万行数据

[英]4 million rows of data in a table

I have 4 million rows of data in a table and I run this query 我在一个表中有400万行数据,并运行此查询

Select * from B

When I check the cpu usage, this query gets high cpu usage. 当我检查CPU使用率时,此查询会获得较高的CPU使用率。 My question is how can I improve cpu usage in this SQL query? 我的问题是如何改善此SQL查询中的CPU使用率?

Are you likely to read a report with four million rows in it? 您是否可能阅读其中包含四百万行的报告? I certainly wouldn't. 当然不会。

And, if not, why are you generating it? 而且,如果没有,为什么要生成它?

If you're dumping the entire table for something like backup purposes, there are probably better ways, specific to the DBMS you're using. 如果出于诸如备份之类的目的而转储整个表,则可能有更好的方法,具体针对您所使用的DBMS。

If you don't need all the data (or even all of it at once), anything that can be used to limit what's coming out (such as a where clause to limit rows and/or more selective column selection than select * ) should help you out, as will proper indexing so the conditions in the where clause can be sped up. 如果你并不需要所有可以用来限制数据(或者甚至它的所有在一次),什么什么的出来(如where子句限制的行和/或更具选择性列选择比select * )应适当的索引会帮助您,以便可以加快where子句中的条件。 This is especially true if the data is going "across the wire", you don't want to be sending unneeded gigabytes across the network. 如果数据“跨线”传输,而您又不想通过网络发送不需要的千兆字节,则尤其如此。

If you really want all columns from all four million rows in normal output format at one time, you'll just have to suffer the performance hit. 如果您真的希望一次将所有四百万行中的所有列都以正常输出格式显示,那么您将不得不遭受性能损失。 Databases offer all sorts of ways to efficiently get at data but, if you want the lot, there's not much they can do. 数据库提供了各种有效获取数据的方法,但是,如果您想要很多,它们将无能为力。

Having said that, there are way to mitigate the impact but it depends on how you have things set up. 话虽这么说,有减轻影响的方法,但这取决于您如何进行设置。 Some examples are: 一些例子是:

  • Have your database replicated, using the primary copy for its intended purpose and a slave replica for reporting. 使用主副本(用于其预期目的)和从属副本(用于报告)复制数据库。 Then hitting the replica won't affect the primary one. 然后,打副本将不会影响主副本。
  • If you can execute a number of smaller queries rather than one big one to get the same result, that's a possibility. 如果您可以执行多个较小的查询而不是一个较大的查询来获得相同的结果,则有可能。 For example one query to get all records from 2015, another after some rest time to get the 2014 ones and so on. 例如,一个查询获取2015年的所有记录,另一个查询经过一段时间休息以获取2014年的记录,依此类推。

That's two things off the top of my head, no doubt there are others but, without knowing more detail, it's hard to advise specifics. 这是我头上要紧的两件事,毫无疑问,还有其他事情,但是,在不了解更多细节的情况下,很难提供具体细节。

Quite simply reduce what your generating. 只需减少您的生成量即可。

Limit the selections 限制选择

SELECT TOP (100) * FROM B

Include a where clause 包含where子句

SELECT * FROM B WHERE COLA = XXX AND COLB = YYY

Specify the columns 指定列

SELECT COLA, COLB, COLC, COLD FROM B WHERE COLA = XXX AND COLB = YYY

OR, if you must return all 4 million rows. 或者,如果必须返回所有四百万行。

Create a view that populates a temp table and set up a job with it in which is ran during "down time", middle of the night etc, then select from the temp table 创建一个填充临时表的视图,并设置一个在“停机时间”,午夜等时间内运行的作业,然后从临时表中进行选择

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM