简体   繁体   中英

Table design Cassandra

I am persisting data from a machine which lets say has different sensors.

CREATE TABLE raw_data (
    device_id uuid,
    time timestamp,
    id uuid,
    unit text,
    value double,
    PRIMARY KEY ((device_id, unit), time)
)

I need to know which sensor was using when the data was sent. I could add an field "sensor_id" and store sensor related data in an other table. Problem about this approach is that i have to store the location of the sensor (A,B,C) which can change. Changing the location in the sensor table would invalidate old data.

I have a feeling that im still thinking to much in the relational way. How would you suggest to solve this?

Given your table description, I would say that device_id is the identifier (or PK) of the device, but this is not what you apparently are thinking... And IMHO this is the root of your problem.

I don't want to look pedant, but I often see that people forget (or do not know) that in relational model, a relation is not (or not only) a relation between tables, but a relation between attributes, ie. values taken in "domain values", including the PK with the PK (cf the relational model definition of Codd that you can easily find on the net). In relational model a table is a relation, a query (a SELECT in SQL, including joins) is also a relation. Even with NoSQL, entities should (IMHO) follow at least the first 3 normal forms (atomicity and dependence on pk for short) which are more or less minimal common sense modeling.

About PK, in the relational model, there are flame debates on natural versus subrogates (unnatural calculated) primary keys. I would tend to natural, and often composite, keys, but this is just an opinion, and of course it depends on context.

In you data model unit should not (IMHO) be part of PK : it does not identify the device, it is a characteristic of the device. The PK must uniquely identify the device, it is not a position or location, a unit or any other characteristic of the device. It is a unique id, a serial number, a combination of other characteristics with is unique for the device and does not change in time or any other dimension.

For example in the case of cars with embedded devices, you have the choice of giving an opaque uuid PK for each embedded device with a reference table to retrieve additional information about the device, and a composite PK which could be given by : car maker, car serial number (sno), device type , device id . like for example :

CREATE TABLE raw_data (
    car_maker text,
    car_sno text,
    device_type text,
    device_id text,
    time timestamp,
    id uuid,
    unit text,
    value double,
    PRIMARY KEY ((car_maker, car_sno, device_type, device_id), time)
)

example data :

( 'bmw', '1256387A1AA43', 'tyrep', 'tyre1', 'bar', 150056709xxx, 2.4 ),
( 'bmw', '1256387A1AA43', 'tyrec', 'tyre1', 'tempC',150056709xxx, 150 ),
( 'bmw', '1256387A1AA43', 'tyrep', 'tyre2', 'bar', 150056709xxx,2.45 ),
( 'bmw', '1256387A1AA43', 'tyrec', 'tyre2', 'tempC', 150056709xxx, 160),
( 'bmw', '1256387A1AA43', 'tyrep', 'tyre3', 'bar', 150056709xxx,2.5 ),
( 'bmw', '1256387A1AA43', 'tyrec', 'tyre3', 'tempC', 150056709xxx, 150 ),
( 'bmw', '1256387A1AA43', 'tyre', 'tyre4', 'bar', 150056709xxx,2.42 ),
( 'bmw', '1256387A1AA43', 'tyre', 'tyre4', 'tempC', 150056709xxx, 150 ),

This is a general thought and must align to your problem. Sometimes, uuids and calculated keys are best.

With Cassandra the difficulty is that you have to design your model around your queries, because the first part of the PK is the partition key and you cannot query (or it is difficult, you have to paginate or use other system like spark) between multiple partitions.

Don't think relational too much, don't be afraid to duplicate. And I would suggest that you also have look at Chebotko diagrams for Cassandra who can help you design your Cassandra schema around queries here or here .

best,

Alain

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM