I'm trying to learn SQL and database design and need some help with selecting a good design of my database in this case. I'm using C# and MySQL.
My input data in this lesson consist of energy meters, all with a unique identification number and every meter delivers one value per hour. I have data from 2013 and onward, and this will continue for a non-specified future. Best guess is 5 years ahead. There are roughly 25 000 meters so there will be 25e3 * 24 = 600 000 data points a day. I get this data once a day via file. The number of meters will change in a slow pace, so there will be around 500 changes per year, adding and removing meters. As a bonus I would like to know when the value was added to the database to calculate some performance-index of the collection system. So this is the input data for each meter:
Every meter delivers one type of data so I can store a table with the type of data, so the data itself will consist of anonymous decimal values. This is where my problem begins. I have tried some different design approaches:
All solutions above leads to quite slow performance when adding data to the database.
If I search Stack Overflow and elsewhere for database design with large number of columns I will always find the answer “Normalize!”, but I do not know how in my case because my novice experience. I have a unique value (valuetime) and I have unique meter ID, this is why I call my data rectangular.
Can someone please lead me to the right path?
For your inputted data:
Meter Table:
ID int PK IDENTITY(1, 1)
MeterName varchar
ReadingsTable:
ID int PK IDENTITY(1, 1)
MeterID int FK
Value decimal
TimeStamp datetime
DateAdded date
You should populate this with an ETL - make an SSIS package or something. Definitely better than a C# app, in my opinion.
Next, you can make aggregation tables:
DailyAggTable:
ID int PK IDENTITY(1, 1)
MeterID int FK
SumOfValue decimal
Date date
You can populate this after your ETL. You can make weekly, monthly, quarterly, yearly, etc. agg tables and schedule their population accordingly. This will improve reporting performance.
Building on Stan Shaw's Answer...
If the data is a CSV file, simply use LOAD DATA
each night. You should probably load into a temp table, massage the data, then copy into the real table(s). Possibly no need for any C# code.
DateAdded
seems somewhat useless, and clutters the table. Either remove completely, or build another table to record the uploads.
Don't bother with an ID on the main table; (MeterID, Timestamp) is the 'natural' PRIMARY KEY
. Again, this saves space.
I would build only daily summary rows in a single summary table. That table might be fast enough to handle weekly/monthly queries. Only if it is not fast enough, should you consider a summary of a summary.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.