[英]Optimizing SQL Server query / table
I have a database table that receives close to 1 million inserts a day that needs to be searchable for at least a year. 我有一个数据库表,每天接收近100万个插件,需要至少一年可搜索。 Big hard drive and lots of data and not that great hardware to put it on either.
大硬盘和大量数据,而不是那么好的硬件。
The table looks like this: 该表如下所示:
id | tag_id | value | time
----------------------------------------
279571 55 0.57 2013-06-18 12:43:22
...
tag_id
might be something like AmbientTemperature
or AmbientHumidity
and the time is captured when the reading is taken from the sensor. tag_id
可能类似于AmbientTemperature
或AmbientHumidity
并且当从传感器获取读数时捕获时间。
I'm querying on this table in a reporting format. 我以报告格式查询此表。 I want to see all data for tags 1,55,72, and 4 between 2013-11-1 and 2013-11-28 at 1 hour intervals.
我希望在2013-11-1和2013-11-28之间以1小时的间隔查看标签1,55,72和4的所有数据。
SELECT time, tag_id, tag_name, value, friendly_name
FROM (
SELECT time, tag_name, tag_id, value,friendly_name,
ROW_NUMBER() over (partition by tag_id,datediff(hour, 0, time)/1 order by time desc) as seqnum
FROM tag_values tv
JOIN tag_names tn ON tn.id = tv.tag_id
WHERE (tag_id = 1 OR tag_id = 55 OR tag_id = 72 OR tag_id = 4)
AND time >= '2013-11-1' AND time < '2013-11-28'
) k
WHERE seqnum = 1
ORDER BY time";
Can I optimize this table or my query at all? 我可以优化此表或查询吗? How should I set up my indexes?
我该如何设置索引?
It's pretty slow with a table size of 100 million + rows. 表格大小为1亿+行,速度相当慢。 It can take several minutes to get a data set of 7 days at an hourly interval with 3 tags in the query.
在查询中使用3个标记以小时为间隔获取7天的数据集可能需要几分钟。
How should I set up my indexes?
我该如何设置索引?
I would try following index: 我会尝试以下索引:
CREATE /*UNIQUE*/ INDEX IX_MyTable_tag_id_time -- If this index could be unique then uncomment UNIQUE
ON dbo.tag_values (tag_id, time)
INCLUDE (value) -- Covered column
WITH (FILLFACTOR = 90); -- Needed to minimize page splits. You should test other values for fill factor to find optimum value for your workload. 90 is just an example. Default value is usually 0 or 100 (see http://technet.microsoft.com/en-us/library/ms190470.aspx)
GO
Filtering on the result of the row number function will make the query painfully slow. 过滤行号函数的结果将使查询变得非常缓慢。 Also it will prevent optimal index use.
此外,它将阻止最佳索引使用。
If your primary reporting need is hourly information you might want to consider storing which rows are the first sensor reading for a tag in a specific hour. 如果您的主要报告需求是每小时信息,您可能需要考虑存储哪些行是特定小时内标记的第一个传感器读数。
ALTER TABLE tag_values ADD IsHourlySensorReading BIT NULL;
In an hourly process, you calculate this column for new rows. 在每小时的过程中,您将为新行计算此列。
DECLARE @CalculateFrom DATETIME = (SELECT MIN(time) FROM tag_values WHERE IsHourlySensorReading IS NULL);
SET @CalculateFrom = dateadd(hour, 0, datediff(hour, 0, @CalculateFrom));
UPDATE k
SET IsHourlySensorReading = CASE seqnum WHEN 1 THEN 1 ELSE 0 END
FROM (
SELECT id, row_number() over (partition by tag_id,datediff(hour, 0, time)/1 order by time desc) as seqnum
FROM tag_values tv
WHERE tv.time >= @CalculateFrom
AND tv.IsHourlySensorReading IS NULL
) as k
Your reporting query then becomes much simpler: 然后,您的报告查询变得更加简单:
SELECT time, tag_id, tag_name, value, friendly_name
FROM (
SELECT time, tag_name, tag_id, value,friendly_name
FROM tag_values tv
JOIN tag_names tn ON tn.id = tv.tag_id
WHERE (tag_id = 1 OR tag_id = 55 OR tag_id = 72 OR tag_id = 4)
AND time >= '2013-11-1' AND time < '2013-11-28'
AND IsHourlySensorReading=1
) k
ORDER BY time;
The following index will help calculating the IsHourlySensorReading column. 以下索引将有助于计算IsHourlySensorReading列。 But remember, indexes will also cause your million inserts per day to take more time.
但请记住,索引也会导致每天百万次插入需要更多时间。 Test thoroughly!
彻底测试!
CREATE NONCLUSTERED INDEX tag_values_ixnc01 ON tag_values (time, IsHourlySensorReading) WHERE (IsHourlySensorReading IS NULL);
Use this index for reporting if you need order by time. 如果您需要按时间顺序,请使用此索引进行报告。
CREATE NONCLUSTERED INDEX tag_values_ixnc02 ON tag_values (time, tag_id, IsHourlySensorReading) INCLUDE (value) WHERE (IsHourlySensorReading = 1);
Use this index for reporting if you don't need order by time. 如果您不需要按时间顺序,请使用此索引进行报告。
CREATE NONCLUSTERED INDEX tag_values_ixnc02 ON tag_values (tag_id, time, IsHourlySensorReading) INCLUDE (value) WHERE (IsHourlySensorReading = 1);
Some additional things to consider: 还需要考虑一些其他事项:
I'm not an expert on sqlserver, but I would seriously consider setting this up as a partitioned table. 我不是sqlserver的专家,但我会认真考虑将其设置为分区表。 This would also make archiving easier as partitions could simply be dropped (rather than an expensive delete from where...).
这也可以简化归档,因为可以简单地删除分区(而不是从哪里删除昂贵的代码)。
Also (with a bit of luck) the optimiser will only look in the partitions required for the data. 另外(运气好的话)优化器只会查看数据所需的分区。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.