简体   繁体   中英

Handling large amounts of data in SQL

I've just taken over a project at work, and my boss has asked me to make it run faster. Great.

So I've identified one of the major bottlenecks to be searching through one particular table from our SQL server, which can take up to a minute , sometimes longer, for a select query with some filters on it to run. Below is the SQL generated by C# Entity Framework (minus all the GO statements):

CREATE TABLE [dbo].[MachineryReading](
    [Id] [int] IDENTITY(1,1) NOT NULL,
    [Location] [geometry] NULL,
    [Latitude] [float] NOT NULL,
    [Longitude] [float] NOT NULL,
    [Altitude] [float] NULL,
    [Odometer] [int] NULL,
    [Speed] [float] NULL,
    [BatteryLevel] [int] NULL,
    [PinFlags] [bigint] NOT NULL, -- Deprecated field, this is now stored in a separate table
    [DateRecorded] [datetime] NOT NULL,
    [DateReceived] [datetime] NOT NULL,
    [Satellites] [int] NOT NULL,
    [HDOP] [float] NOT NULL,
    [MachineryId] [int] NOT NULL,
    [TrackerId] [int] NOT NULL,
    [ReportType] [nvarchar](1) NULL,
    [FixStatus] [int] NOT NULL,
    [AlarmStatus] [int] NOT NULL,
    [OperationalSeconds] [int] NOT NULL,
 CONSTRAINT [PK_dbo.MachineryReading] PRIMARY KEY NONCLUSTERED 
(
    [Id] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
)

ALTER TABLE [dbo].[MachineryReading] ADD  DEFAULT ((0)) FOR [FixStatus]
ALTER TABLE [dbo].[MachineryReading] ADD  DEFAULT ((0)) FOR [AlarmStatus]
ALTER TABLE [dbo].[MachineryReading] ADD  DEFAULT ((0)) FOR [OperationalSeconds]
ALTER TABLE [dbo].[MachineryReading] WITH CHECK ADD  CONSTRAINT [FK_dbo.MachineryReading_dbo.Machinery_MachineryId] FOREIGN KEY([MachineryId])
REFERENCES [dbo].[Machinery] ([Id])
  ON DELETE CASCADE
ALTER TABLE [dbo].[MachineryReading] CHECK CONSTRAINT [FK_dbo.MachineryReading_dbo.Machinery_MachineryId]
ALTER TABLE [dbo].[MachineryReading] WITH CHECK ADD  CONSTRAINT [FK_dbo.MachineryReading_dbo.Tracker_TrackerId] FOREIGN KEY([TrackerId])
  REFERENCES [dbo].[Tracker] ([Id])
  ON DELETE CASCADE
ALTER TABLE [dbo].[MachineryReading] CHECK CONSTRAINT [FK_dbo.MachineryReading_dbo.Tracker_TrackerId]

The table has indexes on MachineryId , TrackerId , and DateRecorded :

CREATE NONCLUSTERED INDEX [IX_MachineryId] ON [dbo].[MachineryReading]
(
    [MachineryId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

CREATE NONCLUSTERED INDEX [IX_MachineryId_DateRecorded] ON [dbo].[MachineryReading]
(
    [MachineryId] ASC,
    [DateRecorded] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

CREATE NONCLUSTERED INDEX [IX_TrackerId] ON [dbo].[MachineryReading]
(
    [TrackerId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)

When we select from this table, we are almost always interested in one machinery or tracker, over a given date range:

SELECT *
FROM MachineryReading
WHERE MachineryId = 2127 AND
      DateRecorded > '2016-12-08 00:00:10.009' AND DateRecorded < '2016-12-11 18:32:41.734'

As you can see, it's quite a basic setup. The main problem is the sheer amount of data we put into it - about one row every ten seconds per tracker, and we have over a hundred trackers at the moment. We're currently sitting somewhere around 10-15 million rows. So this leaves me with two questions.

  • Am I thrashing the database if I insert 10 rows per second (without batching them)?
  • Given that this is historical data, so once it is inserted it will never change, is there anything I can do to speed up read access?
  1. You have too many non-clustered indexes on the table - which will increase the size of the DB.

If you have an index on MachineryId and DateRecorded - you don't really need a separate one on MachineryId .

With 3 of your Non-Clustered indexes - there are 3 more copies of the data

Clustered VS Non-Clustered

No Include on the Non-Clustered index

When SQL Server is executing your SQL it is first searching the Non-Clustered Index for the required data, then it is going back to the original table ( bookmark lookup ) Link and getting the rest of the columns as you are doing select * , but the non-clustered index doesn't have all the columns (That is what I think is happening - Can't really tell without the Query Plan)

Include columns in non-clustered index: https://stackoverflow.com/a/1308325/1910735

  1. You should maintain you indexes - by creating a maintenance plan to check for fragmentation and rebuild or reorganize your indexes on weekly basis.

  2. I really think you should have a Clustered index on your MachineryId and DateRecordred instead of a Non-Clustered index. A table can only have one Clustered Index ( this is the order data is stored on the Hard Disk) - as Most of your queries will be in DateRecordred and MachineryId order - it will be better to store them that way,

Also if you really are searching by TrackerId in any query, try adding it to the same Clustered Index

IMPORTANT NOTE: DELETE THE NON-CLUSTERED INDEX in TEST environment before going LIVE

Create a clustered index instead of your non-clustered index, run different queries - Check the performance by comparing the Query Plans and the STATISTICS IO )

Some resources for Index and SQL Query help:

Subscribe to the newsletter here and download the first responder kit: https://www.brentozar.com/?s=first+responder

It is now open source - but I don't know if it has the actual PDF getting started and help files (Subscribe in the above link anyway - for weekly articles/tutorials)

https://github.com/BrentOzarULTD/SQL-Server-First-Responder-Kit

Tuning is per query, but in any case -
I see you have no partitions and no indexes, which means, no matter what you do. it always results in a full table scan.

For your specific query -

create index MachineryReading_ix_MachineryReading_DateRecorded 
    on (MachineryReading,DateRecorded)

First, 10 inserts per second is very feasible under almost any reasonable circumstances.

Second, you need an index. For this query:

SELECT *
FROM MachineryReading
WHERE MachineryId = 2127 AND
      DateRecorded > '2016-12-08 00:00:10.009' AND DateRecorded < '2016-12-11 18:32:41.734';

You need an index on MachineryReading(MachineryId, DateRecorded) . That will probably solve your performance problem.

If you have similar queries for tracker, then you want an index on MachineryReading(TrackerId, DateRecorded) .

These will slightly impede the progress of in the inserts . But the overall improvement should be so great, that all will be a big win.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM