简体   繁体   中英

How do you handle database normalization design for optional columns?

I am working on a system that stores sensor data. Most sensors measure a single value but some can measure many values for each sample period. I am trying to keep my database as normalized as possible without suffering performance problems for looking up lots of sample data. My question is how to design the sensor data table to account for optional measured data values. For example, sensor A only reads one value, but sensor B reads 5 values. How do I store both sets of data in the data table?

Option 1 is to create a flat structure with a table that has a bunch of columns (value1, value2, value3...valueN, etc) and a field that records how many columns are used. Functional but bad design in my opinion:

Sensor Data
  Sensor ID (Pk)
  Timestamp (PK)
  Columns Used
  Value 1
  Value 2
  Value 3
  ...
  Value n

The other option is to highly normalize the structure and have a data table that uses a composite key to store individual data values. It would track the sensor id, timestamp, and data type to maintain unique values. This is highly normalized and allows for an unlimited number of optional data values per sample, but duplicates a lot of information (specifically, sensor id and timestamp):

Sensor Data
  Sensor ID (Pk)
  Timestamp (Pk)
  Data Type (Pk)
  Value

This wouldn't be that bad for a few thousand samples, but this system is designed to store millions of sensor samples and joining those values could suffer performance problems (ie WHERE Sensor ID and Timestamp are equal but the Data Type is different).

Anyone have a better idea for designing a database to store optional values? Side note: the design has to work with SQL Server and Entity Framework (EF).

I think going with option 2 is not bad, even if database will have milions of rows. You will only need a index on SensiorId and Timestamp.

I can think of one different design containing two tables:

**SensorRead**
Id (PK)
SensorId
Timestamp

**SensorData**
Id(PK)
ReadId(FK)
Value
DataType

If you will query that schema for values for given SensorId and timestamp, then it will result in the join between 10 rows (assuming the sensor read's 10 data points). So the cost is almost none.

Aside from the question itself- Im not sure, that having multiple columns as PK's will work good with entity framework... Never tried it, but if you decide to go that way do some research about this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM