简体   繁体   中英

Database design with rectangular data

I'm trying to learn SQL and database design and need some help with selecting a good design of my database in this case. I'm using C# and MySQL.
My input data in this lesson consist of energy meters, all with a unique identification number and every meter delivers one value per hour. I have data from 2013 and onward, and this will continue for a non-specified future. Best guess is 5 years ahead. There are roughly 25 000 meters so there will be 25e3 * 24 = 600 000 data points a day. I get this data once a day via file. The number of meters will change in a slow pace, so there will be around 500 changes per year, adding and removing meters. As a bonus I would like to know when the value was added to the database to calculate some performance-index of the collection system. So this is the input data for each meter:

  • Valuetime (datetime)
  • Value (decimal data)
  • Date_added (datetime)

Every meter delivers one type of data so I can store a table with the type of data, so the data itself will consist of anonymous decimal values. This is where my problem begins. I have tried some different design approaches:

  1. One large table with each row consisting of one-hour data, and one column per meter. Failure due to large amount of columns, and I need a separate equally big table with “Date_added”.
  2. One table per meter, columns valuetime, value and date_added. Failure due to slow performance in C#-program.
  3. Partitioned tables (ie table1 = meter begins with 1 and so forth). This still leads to many columns.
  4. Partitioned table where table 10 = meter begins with 10 and so forth. This still lead to many columns.

All solutions above leads to quite slow performance when adding data to the database.

If I search Stack Overflow and elsewhere for database design with large number of columns I will always find the answer “Normalize!”, but I do not know how in my case because my novice experience. I have a unique value (valuetime) and I have unique meter ID, this is why I call my data rectangular.

Can someone please lead me to the right path?

For your inputted data:

Meter Table:

ID int PK IDENTITY(1, 1)
MeterName varchar

ReadingsTable:

ID int PK IDENTITY(1, 1)
MeterID int FK
Value decimal
TimeStamp datetime
DateAdded date

You should populate this with an ETL - make an SSIS package or something. Definitely better than a C# app, in my opinion.

Next, you can make aggregation tables:

DailyAggTable:

ID int PK IDENTITY(1, 1)
MeterID int FK
SumOfValue decimal
Date date

You can populate this after your ETL. You can make weekly, monthly, quarterly, yearly, etc. agg tables and schedule their population accordingly. This will improve reporting performance.

Building on Stan Shaw's Answer...

If the data is a CSV file, simply use LOAD DATA each night. You should probably load into a temp table, massage the data, then copy into the real table(s). Possibly no need for any C# code.

DateAdded seems somewhat useless, and clutters the table. Either remove completely, or build another table to record the uploads.

Don't bother with an ID on the main table; (MeterID, Timestamp) is the 'natural' PRIMARY KEY . Again, this saves space.

I would build only daily summary rows in a single summary table. That table might be fast enough to handle weekly/monthly queries. Only if it is not fast enough, should you consider a summary of a summary.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM