简体   繁体   English

优化SQL Server查询/表

[英]Optimizing SQL Server query / table

I have a database table that receives close to 1 million inserts a day that needs to be searchable for at least a year. 我有一个数据库表,每天接收近100万个插件,需要至少一年可搜索。 Big hard drive and lots of data and not that great hardware to put it on either. 大硬盘和大量数据,而不是那么好的硬件。

The table looks like this: 该表如下所示:

id      | tag_id  |  value  |  time 
----------------------------------------
279571     55         0.57    2013-06-18 12:43:22
...

tag_id might be something like AmbientTemperature or AmbientHumidity and the time is captured when the reading is taken from the sensor. tag_id可能类似于AmbientTemperatureAmbientHumidity并且当从传感器获取读数时捕获时间。

I'm querying on this table in a reporting format. 我以报告格式查询此表。 I want to see all data for tags 1,55,72, and 4 between 2013-11-1 and 2013-11-28 at 1 hour intervals. 我希望在2013-11-1和2013-11-28之间以1小时的间隔查看标签1,55,72和4的所有数据。

SELECT time, tag_id, tag_name, value, friendly_name
FROM (
    SELECT time, tag_name, tag_id, value,friendly_name, 
        ROW_NUMBER() over (partition by tag_id,datediff(hour, 0, time)/1 order by time desc) as seqnum
    FROM tag_values tv 
    JOIN tag_names tn ON tn.id = tv.tag_id
    WHERE (tag_id = 1 OR tag_id = 55 OR tag_id = 72 OR tag_id = 4)
        AND time >= '2013-11-1' AND time < '2013-11-28'
    ) k
WHERE seqnum = 1
ORDER BY time";

Can I optimize this table or my query at all? 我可以优化此表或查询吗? How should I set up my indexes? 我该如何设置索引?

It's pretty slow with a table size of 100 million + rows. 表格大小为1亿+行,速度相当慢。 It can take several minutes to get a data set of 7 days at an hourly interval with 3 tags in the query. 在查询中使用3个标记以小时为间隔获取7天的数据集可能需要几分钟。

How should I set up my indexes? 我该如何设置索引?

I would try following index: 我会尝试以下索引:

CREATE /*UNIQUE*/ INDEX IX_MyTable_tag_id_time -- If this index could be unique then uncomment UNIQUE
ON dbo.tag_values (tag_id, time)
INCLUDE (value) -- Covered column
WITH (FILLFACTOR = 90); -- Needed to minimize page splits. You should test other values for fill factor to find optimum value for your workload. 90 is just an example. Default value is usually 0 or 100 (see http://technet.microsoft.com/en-us/library/ms190470.aspx) 
GO

Filtering on the result of the row number function will make the query painfully slow. 过滤行号函数的结果将使查询变得非常缓慢。 Also it will prevent optimal index use. 此外,它将阻止最佳索引使用。

If your primary reporting need is hourly information you might want to consider storing which rows are the first sensor reading for a tag in a specific hour. 如果您的主要报告需求是每小时信息,您可能需要考虑存储哪些行是特定小时内标记的第一个传感器读数。

ALTER TABLE tag_values ADD IsHourlySensorReading BIT NULL;

In an hourly process, you calculate this column for new rows. 在每小时的过程中,您将为新行计算此列。

DECLARE @CalculateFrom DATETIME = (SELECT MIN(time) FROM tag_values WHERE IsHourlySensorReading IS NULL);
SET @CalculateFrom = dateadd(hour, 0, datediff(hour, 0, @CalculateFrom));

UPDATE k
SET IsHourlySensorReading = CASE seqnum WHEN 1 THEN 1 ELSE 0 END
FROM (
    SELECT id, row_number() over (partition by tag_id,datediff(hour, 0, time)/1 order by time desc) as seqnum
    FROM tag_values tv
    WHERE tv.time >= @CalculateFrom
    AND tv.IsHourlySensorReading IS NULL
) as k

Your reporting query then becomes much simpler: 然后,您的报告查询变得更加简单:

SELECT time, tag_id, tag_name, value, friendly_name
FROM (
    SELECT time, tag_name, tag_id, value,friendly_name
    FROM tag_values tv 
    JOIN tag_names tn ON tn.id = tv.tag_id
    WHERE (tag_id = 1 OR tag_id = 55 OR tag_id = 72 OR tag_id = 4)
        AND time >= '2013-11-1' AND time < '2013-11-28'
        AND IsHourlySensorReading=1
    ) k
ORDER BY time;

The following index will help calculating the IsHourlySensorReading column. 以下索引将有助于计算IsHourlySensorReading列。 But remember, indexes will also cause your million inserts per day to take more time. 但请记住,索引也会导致每天百万次插入需要更多时间。 Test thoroughly! 彻底测试!

CREATE NONCLUSTERED INDEX tag_values_ixnc01 ON tag_values (time, IsHourlySensorReading) WHERE (IsHourlySensorReading IS NULL);

Use this index for reporting if you need order by time. 如果您需要按时间顺序,请使用此索引进行报告。

CREATE NONCLUSTERED INDEX tag_values_ixnc02 ON tag_values (time, tag_id, IsHourlySensorReading) INCLUDE (value) WHERE (IsHourlySensorReading = 1);

Use this index for reporting if you don't need order by time. 如果您不需要按时间顺序,请使用此索引进行报告。

CREATE NONCLUSTERED INDEX tag_values_ixnc02 ON tag_values (tag_id, time, IsHourlySensorReading) INCLUDE (value) WHERE (IsHourlySensorReading = 1);

Some additional things to consider: 还需要考虑一些其他事项:

  • Is ORDER BY time really required? ORDER BY时间真的需要吗?
  • Table partitioning can seriously improve both insert and query performance. 表分区可以严重提高插入和查询性能。 Depending on your situation I would partition on either tag_id or date. 根据您的情况,我会在tag_id或date上进行分区。
  • Instead of creating a column with an IsHourlySensorReading indicator, you can also create a separate table/database for specific reporting requirements and only load the relevant data into that. 您可以为特定的报告要求创建单独的表/数据库,而只是将相关数据加载到该列中,而不是使用IsHourlySensorReading指示符创建列。

I'm not an expert on sqlserver, but I would seriously consider setting this up as a partitioned table. 我不是sqlserver的专家,但我会认真考虑将其设置为分区表。 This would also make archiving easier as partitions could simply be dropped (rather than an expensive delete from where...). 这也可以简化归档,因为可以简单地删除分区(而不是从哪里删除昂贵的代码)。

Also (with a bit of luck) the optimiser will only look in the partitions required for the data. 另外(运气好的话)优化器只会查看数据所需的分区。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM