简体繁体中英

Cassandra or PostgreSQL: High volume of Inserts per minute

原文 2017-11-05 04:32:13 7 2 database/ postgresql/ cassandra

Here is my scenario:

I've 100,000+ tables .
I've to make inserts in each table every minute, ie 100,000+ inserts per minute ALL in separate tables .
Data loss doesn't matter much but speed and cost does.
Insertion fields would be id, param1, param2, param3, param4, param5, timestamp.

Please let me know which database would be faster and cheaper for this case.

2 answers

Cassandra may face serious scalability issues with 100,000 separate tables. 100,000 separate tables means a multiple of 100,000 open files (so you'll need to make sure your kernel is configured to allow so many open files), 100,000 memtables (where the last modifications to each table are temporarily kept in memory) so you'll need a lot of memory.

An alternative way to do something like this in Cassandra is to have one table, with 100,000 different partitions (which is the Cassandra name for wide rows). Each minute you'd be adding one further row (a small entry) to each of the existing partitions. To avoid partitions growing huge after, say, months of adding entries, what one normally does is to start a new partition every, say, week (each week has about 10,000 minutes). In Cassandra modelling this is often called "time series data".

In your question, you only mentioned writing data, and not reading it. Assuming this is not an oversight, and you really care more about the write performance and not read performance, then Cassandra is a good fit because it is especially fast for writes. If you absolutely care about speed and performance-per-dollar, you should also take a look at Scylla , a re-implementation of Cassandra in C++.

Sounds like data model fits to time series model. TimeScaleDB may handle your model with new distributed model. The tables would be just one more indexed field. Ie keep data in time order, enable compression. May consider different types of index not restrict yourself to B-trees only.

Our finance data tests showed amazing compression ratios - specially if all tables have similar data for close time periods eg cumulative and scaled values with 3-4k instruments. Didn't try with 100k but may consider do some benchmarks and see where is limit and in case of steep degradation shard to different machine/cluster.

Maintenance may be bit problematic if one decide to manage multiple manually shared servers, but single box can do cost magic in comparison to modern clusters. Multiple powerful isolated boxes can be used if data loss can be tolerated eg replayed from different source in reasonable time (like efficient market data replay from archives)

What Database, and In Which Setup Can handle Several Millions Inserts Per Minute?

how to make 1 million inserts in cassandra

Maximum number of inserts per transaction

Are long high volume transactions bad

Storing and processing high data volume

Cassandra or MySQL/PostgreSQL?

Cassandra cross dc for high availability

Perform multiple inserts per POST request

PostgreSQL nested INSERTs / WITHs for foreign key insertions

Postgresql inserts stop at random number of records

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question What Database, and In Which Setup Can handle Several Millions Inserts Per Minute? how to make 1 million inserts in cassandra Maximum number of inserts per transaction Are long high volume transactions bad Storing and processing high data volume Cassandra or MySQL/PostgreSQL? Cassandra cross dc for high availability Perform multiple inserts per POST request PostgreSQL nested INSERTs / WITHs for foreign key insertions Postgresql inserts stop at random number of records

Related Tags

Cassandra or PostgreSQL: High volume of Inserts per minute

Question

2 answers

solution1
11 2017-11-06 00:07:19

solution2
0 2021-01-21 11:15:01

Cassandra or PostgreSQL: High volume of Inserts per minute

Question

2 answers

solution1 11 2017-11-06 00:07:19

solution2 0 2021-01-21 11:15:01

solution1
11 2017-11-06 00:07:19

solution2
0 2021-01-21 11:15:01