简体   繁体   中英

Single data column vs multiple columns in Cassandra

I'm working on a project with an existing cassandra database. The schema looks like this:

partition key (big int) clustering key1 (timestamp) data (text)
1 2021-03-10 11:54:00.000 {a:"somedata", b:2, ...}

My question is: Is there any advantage storing data in a json string? Will it save some space?

Until now I discovered disadvantages only:

  • You cannot (easily) add/drop columns at runtime, since the application could override the json string column.
  • Parsing the json string is currently the bottleneck regarding performance.

No, there is no real advantage to storing JSON as string in Cassandra unless the underlying data in the JSON is really schema-less. It will also not save space but in fact use more because each item has to have a key+value instead of just storing the value.

If you can, I would recommend mapping the keys to CQL columns so you can store the values natively and accessing the data is more flexible. Cheers!

Erick is spot-on-correct with his answer.

The only thing I'd add, would be that storing JSON blobs in a single column makes updates (even more) problematic. If you update a single JSON property, the whole column gets rewritten. Also the original JSON blob is still there ...just "obsoleted" until compaction runs. The only time that storing a JSON blob in a single column makes any sense, is if the properties don't change.

And I agree, mapping the keys to CQL columns is a much better option.

I don't disagree with the excellent and already accepted answer by @erick-ramirez.

However there is often a good case to be made for using frozen UDTs instead of separate columns for related data that is only ever going to be set and retrieved at the same time and will not be specifically filtered as part of your query.

The "frozen" part is important as it means less work for cassandra but does mean that you rewrite the whole value each update.

This can have a large performance boost over a large number of columns. The nice ScyllaDB people have a great post on that:

If You Care About Performance, Employ User Defined Types

(I know Scylla DB is not exactly Cassandra but I've seen multiple articles that say the same thing about Cassandra)

One downside is that you add work to the application layer and sometimes mapping complex UDTs to your Java types will be interesting.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM