简体   繁体   English

您如何处理可选列的数据库规范化设计?

[英]How do you handle database normalization design for optional columns?

I am working on a system that stores sensor data. 我正在一个存储传感器数据的系统上工作。 Most sensors measure a single value but some can measure many values for each sample period. 大多数传感器只能测量一个值,但有些传感器可以在每个采样周期内测量多个值。 I am trying to keep my database as normalized as possible without suffering performance problems for looking up lots of sample data. 我试图使我的数据库尽可能地规范化,而不会因为查找大量样本数据而遭受性能问题。 My question is how to design the sensor data table to account for optional measured data values. 我的问题是如何设计传感器数据表以考虑可选的测量数据值。 For example, sensor A only reads one value, but sensor B reads 5 values. 例如,传感器A仅读取一个值,而传感器B读取5个值。 How do I store both sets of data in the data table? 如何将两组数据存储在数据表中?

Option 1 is to create a flat structure with a table that has a bunch of columns (value1, value2, value3...valueN, etc) and a field that records how many columns are used. 选项1是使用一个表,该表具有一堆列(值1,值2,值3 ...值N等)和一个记录使用多少列的字段来创建平面结构。 Functional but bad design in my opinion: 我认为实用但糟糕的设计:

Sensor Data
  Sensor ID (Pk)
  Timestamp (PK)
  Columns Used
  Value 1
  Value 2
  Value 3
  ...
  Value n

The other option is to highly normalize the structure and have a data table that uses a composite key to store individual data values. 另一个选择是高度规范化结构,并拥有一个使用复合键存储单个数据值的数据表。 It would track the sensor id, timestamp, and data type to maintain unique values. 它将跟踪传感器ID,时间戳和数据类型以保持唯一值。 This is highly normalized and allows for an unlimited number of optional data values per sample, but duplicates a lot of information (specifically, sensor id and timestamp): 这是高度归一化的,并允许每个样本无限数量的可选数据值,但是重复了很多信息(特别是传感器ID和时间戳):

Sensor Data
  Sensor ID (Pk)
  Timestamp (Pk)
  Data Type (Pk)
  Value

This wouldn't be that bad for a few thousand samples, but this system is designed to store millions of sensor samples and joining those values could suffer performance problems (ie WHERE Sensor ID and Timestamp are equal but the Data Type is different). 对于几千个样本来说,这并不是一件坏事,但是该系统旨在存储数百万个传感器样本,将这些值合并可能会遇到性能问题(即WHERE传感器ID和时间戳相等,但数据类型不同)。

Anyone have a better idea for designing a database to store optional values? 任何人对设计数据库来存储可选值都有更好的主意吗? Side note: the design has to work with SQL Server and Entity Framework (EF). 旁注:设计必须与SQL Server和实体框架(EF)一起使用。

I think going with option 2 is not bad, even if database will have milions of rows. 我认为即使数据库将包含数百万行,使用选项2也不错。 You will only need a index on SensiorId and Timestamp. 您只需要在SensiorId和Timestamp上建立索引。

I can think of one different design containing two tables: 我可以想到一个包含两个表的不同设计:

**SensorRead**
Id (PK)
SensorId
Timestamp

**SensorData**
Id(PK)
ReadId(FK)
Value
DataType

If you will query that schema for values for given SensorId and timestamp, then it will result in the join between 10 rows (assuming the sensor read's 10 data points). 如果您要查询该架构以获取给定SensorId和时间戳的值,那么它将导致10行之间的联接(假设传感器读取了10个数据点)。 So the cost is almost none. 因此成本几乎为零。

Aside from the question itself- Im not sure, that having multiple columns as PK's will work good with entity framework... Never tried it, but if you decide to go that way do some research about this. 除了问题本身之外-我不确定,将多个列作为PK可以与实体框架一起使用...从未尝试过,但是如果您决定采用这种方式,请对此进行一些研究。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM