简体   繁体   English

矩形数据的数据库设计

[英]Database design with rectangular data

I'm trying to learn SQL and database design and need some help with selecting a good design of my database in this case. 我正在尝试学习SQL和数据库设计,在这种情况下,需要一些帮助来选择数据库的良好设计。 I'm using C# and MySQL. 我正在使用C#和MySQL。
My input data in this lesson consist of energy meters, all with a unique identification number and every meter delivers one value per hour. 我在本课中输入的数据包括电表,所有电表都有唯一的标识号,每个电表每小时输出一个值。 I have data from 2013 and onward, and this will continue for a non-specified future. 我拥有2013年及以后的数据,在不确定的将来还会继续使用。 Best guess is 5 years ahead. 最好的猜测是未来5年。 There are roughly 25 000 meters so there will be 25e3 * 24 = 600 000 data points a day. 大约有25000米,因此每天将有25e3 * 24 = 600 000个数据点。 I get this data once a day via file. 我每天通过文件获取一次此数据。 The number of meters will change in a slow pace, so there will be around 500 changes per year, adding and removing meters. 电表的数量将以缓慢的速度变化,因此每年将增加和删除电表约500次。 As a bonus I would like to know when the value was added to the database to calculate some performance-index of the collection system. 另外,我想知道何时将值添加到数据库中以计算收集系统的某些性能指标。 So this is the input data for each meter: 因此,这是每个仪表的输入数据:

  • Valuetime (datetime) 价值时间(日期时间)
  • Value (decimal data) 值(十进制数据)
  • Date_added (datetime) 添加日期(日期时间)

Every meter delivers one type of data so I can store a table with the type of data, so the data itself will consist of anonymous decimal values. 每个仪表都提供一种数据类型,因此我可以存储具有该数据类型的表,因此数据本身将由匿名十进制值组成。 This is where my problem begins. 这是我的问题开始的地方。 I have tried some different design approaches: 我尝试了一些不同的设计方法:

  1. One large table with each row consisting of one-hour data, and one column per meter. 一个大表,每行包括一个小时的数据,每米一列。 Failure due to large amount of columns, and I need a separate equally big table with “Date_added”. 由于存在大量列而导致失败,因此我需要一个单独的具有“ Date_added”的大表。
  2. One table per meter, columns valuetime, value and date_added. 每米一张表,列valuetime,value和date_add。 Failure due to slow performance in C#-program. 由于C#程序性能下降而失败。
  3. Partitioned tables (ie table1 = meter begins with 1 and so forth). 分区表(即table1 = meter以1开头,依此类推)。 This still leads to many columns. 这仍然导致许多列。
  4. Partitioned table where table 10 = meter begins with 10 and so forth. 分区表,其中表10 =米以10开头,依此类推。 This still lead to many columns. 这仍然导致许多列。

All solutions above leads to quite slow performance when adding data to the database. 将数据添加到数据库时,上述所有解决方案都会导致性能下降。

If I search Stack Overflow and elsewhere for database design with large number of columns I will always find the answer “Normalize!”, but I do not know how in my case because my novice experience. 如果我在Stack Overflow以及其他地方搜索具有大量列的数据库设计,我总是会找到答案“ Normalize!”,但由于我是新手,所以我不知道这种情况。 I have a unique value (valuetime) and I have unique meter ID, this is why I call my data rectangular. 我具有唯一的值(valuetime),并且具有唯一的仪表ID,这就是为什么我将数据称为矩形的原因。

Can someone please lead me to the right path? 有人可以引导我走正确的道路吗?

For your inputted data: 对于您输入的数据:

Meter Table: 仪表表:

ID int PK IDENTITY(1, 1)
MeterName varchar

ReadingsTable: ReadingsTable:

ID int PK IDENTITY(1, 1)
MeterID int FK
Value decimal
TimeStamp datetime
DateAdded date

You should populate this with an ETL - make an SSIS package or something. 您应该用ETL填充-制作SSIS包或其他东西。 Definitely better than a C# app, in my opinion. 我认为绝对比C#应用更好。

Next, you can make aggregation tables: 接下来,您可以创建聚合表:

DailyAggTable: DailyAggTable:

ID int PK IDENTITY(1, 1)
MeterID int FK
SumOfValue decimal
Date date

You can populate this after your ETL. 您可以在ETL之后填充它。 You can make weekly, monthly, quarterly, yearly, etc. agg tables and schedule their population accordingly. 您可以制作每周,每月,每季度,每年等汇总表,并相应地安排其人口。 This will improve reporting performance. 这将提高报告性能。

Building on Stan Shaw's Answer... 基于Stan Shaw的答案...

If the data is a CSV file, simply use LOAD DATA each night. 如果数据是CSV文件,则只需每晚使用LOAD DATA You should probably load into a temp table, massage the data, then copy into the real table(s). 您可能应该加载到临时表中,对数据进行按摩,然后复制到实际表中。 Possibly no need for any C# code. 可能不需要任何C#代码。

DateAdded seems somewhat useless, and clutters the table. DateAdded似乎没有用,并且使表DateAdded Either remove completely, or build another table to record the uploads. 完全删除,或构建另一个表来记录上载。

Don't bother with an ID on the main table; 不要打扰主表上的ID; (MeterID, Timestamp) is the 'natural' PRIMARY KEY . (MeterID,时间戳记)是“自然” PRIMARY KEY Again, this saves space. 同样,这节省了空间。

I would build only daily summary rows in a single summary table. 我只会在一个汇总表中建立每日汇总行。 That table might be fast enough to handle weekly/monthly queries. 该表可能足够快以处理每周/每月查询。 Only if it is not fast enough, should you consider a summary of a summary. 仅当它不够快时,才应考虑摘要的摘要。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM