简体   繁体   English

餐桌设计卡桑德拉

[英]Table design Cassandra

I am persisting data from a machine which lets say has different sensors. 我正在从一台具有不同传感器的机器上保存数据。

CREATE TABLE raw_data (
    device_id uuid,
    time timestamp,
    id uuid,
    unit text,
    value double,
    PRIMARY KEY ((device_id, unit), time)
)

I need to know which sensor was using when the data was sent. 我需要知道发送数据时正在使用哪个传感器。 I could add an field "sensor_id" and store sensor related data in an other table. 我可以添加一个字段“ sensor_id”,并将与传感器相关的数据存储在另一个表中。 Problem about this approach is that i have to store the location of the sensor (A,B,C) which can change. 这种方法的问题是我必须存储可以改变的传感器(A,B,C)的位置。 Changing the location in the sensor table would invalidate old data. 更改传感器表中的位置将使旧数据无效。

I have a feeling that im still thinking to much in the relational way. 我觉得我仍然在以关系方式思考很多。 How would you suggest to solve this? 您如何建议解决此问题?

Given your table description, I would say that device_id is the identifier (or PK) of the device, but this is not what you apparently are thinking... And IMHO this is the root of your problem. 给定您的表描述,我想说device_id是设备的标识符(或PK),但这显然不是您在想的...而且恕我直言,这是问题的根源。

I don't want to look pedant, but I often see that people forget (or do not know) that in relational model, a relation is not (or not only) a relation between tables, but a relation between attributes, ie. 我不想显得学究,但我经常看到人们忘记(或不知道)在关系模型中,关系不是(或不仅是)表之间的关系,而是属性之间的关系,即。 values taken in "domain values", including the PK with the PK (cf the relational model definition of Codd that you can easily find on the net). 在“域值”中获取的值,包括PK和PK(请参见在网络上可以轻松找到的Codd的关系模型定义)。 In relational model a table is a relation, a query (a SELECT in SQL, including joins) is also a relation. 在关系模型中,表是一个关系,查询(SQL中的SELECT,包括联接)也是一个关系。 Even with NoSQL, entities should (IMHO) follow at least the first 3 normal forms (atomicity and dependence on pk for short) which are more or less minimal common sense modeling. 即使使用NoSQL,实体(IMHO)也应至少遵循前3种正常形式(原子性和对pk的依赖性),这至少是最小常识建模。

About PK, in the relational model, there are flame debates on natural versus subrogates (unnatural calculated) primary keys. 关于PK,在关系模型中,关于自然主键与代位副键(非自然计算出的)主键存在激烈的争论。 I would tend to natural, and often composite, keys, but this is just an opinion, and of course it depends on context. 我倾向于使用自然键(通常是复合键),但这只是一种意见,当然取决于上下文。

In you data model unit should not (IMHO) be part of PK : it does not identify the device, it is a characteristic of the device. 在您中,数据模型单元(IMHO)不应作为PK的一部分:它不能识别设备,它是设备的特征。 The PK must uniquely identify the device, it is not a position or location, a unit or any other characteristic of the device. PK必须唯一地标识设备,它不是设备的位置或位置,单元或任何其他特征。 It is a unique id, a serial number, a combination of other characteristics with is unique for the device and does not change in time or any other dimension. 它是唯一的ID,序列号,其他特征的组合,并且对于设备而言是唯一的,并且不会随时间或其他任何维度而变化。

For example in the case of cars with embedded devices, you have the choice of giving an opaque uuid PK for each embedded device with a reference table to retrieve additional information about the device, and a composite PK which could be given by : car maker, car serial number (sno), device type , device id . 例如,对于带有嵌入式设备的汽车,您可以选择为每个嵌入式设备提供不透明的uuid PK,并带有参考表以检索有关该设备的其他信息,并可以通过以下方式提供复合PK:car maker,汽车序列号​​(sno),设备类型,设备ID。 like for example : 例如:

CREATE TABLE raw_data (
    car_maker text,
    car_sno text,
    device_type text,
    device_id text,
    time timestamp,
    id uuid,
    unit text,
    value double,
    PRIMARY KEY ((car_maker, car_sno, device_type, device_id), time)
)

example data : 示例数据:

( 'bmw', '1256387A1AA43', 'tyrep', 'tyre1', 'bar', 150056709xxx, 2.4 ),
( 'bmw', '1256387A1AA43', 'tyrec', 'tyre1', 'tempC',150056709xxx, 150 ),
( 'bmw', '1256387A1AA43', 'tyrep', 'tyre2', 'bar', 150056709xxx,2.45 ),
( 'bmw', '1256387A1AA43', 'tyrec', 'tyre2', 'tempC', 150056709xxx, 160),
( 'bmw', '1256387A1AA43', 'tyrep', 'tyre3', 'bar', 150056709xxx,2.5 ),
( 'bmw', '1256387A1AA43', 'tyrec', 'tyre3', 'tempC', 150056709xxx, 150 ),
( 'bmw', '1256387A1AA43', 'tyre', 'tyre4', 'bar', 150056709xxx,2.42 ),
( 'bmw', '1256387A1AA43', 'tyre', 'tyre4', 'tempC', 150056709xxx, 150 ),

This is a general thought and must align to your problem. 这是一个普遍的想法,必须适合您的问题。 Sometimes, uuids and calculated keys are best. 有时,uuid和计算出的键是最好的。

With Cassandra the difficulty is that you have to design your model around your queries, because the first part of the PK is the partition key and you cannot query (or it is difficult, you have to paginate or use other system like spark) between multiple partitions. 使用Cassandra时,困难在于您必须围绕查询设计模型,因为PK的第一部分是分区键,并且您无法在多个查询之间进行查询(或者很难分页或使用spark等其他系统)分区。

Don't think relational too much, don't be afraid to duplicate. 不要认为关系太多,不要害怕重复。 And I would suggest that you also have look at Chebotko diagrams for Cassandra who can help you design your Cassandra schema around queries here or here . 我建议您也看看Cassandra的Chebotko图,它可以帮助您围绕此处此处的查询设计Cassandra模式。

best, 最好,

Alain 阿兰

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM