简体   繁体   中英

cassandra performance with very high number of columns per row

I am considering storing data with number of columns reaching between 100-250 million per row with max 2-3k rows in a column family. I will be using composite columns to allow slicing the data and will limit the slice range to a reasonable value which can be handled within process memory limits.

One CF will have no column value just column names with 100-250 millon columns and other CF will have same number of columns but with approx 20-30kb data per column value.

I assume slicing does not require loading all column names etc to slice the data.

There will be 5% rows with such a high number of columns, rest will have 15-20 million max.

Anyone has tried with such a large volume of columns per row in Column Family and how was the performance...

If above works fine it saves me a great deal of work of managing multiple CFs.

Thanks

I have worked on data of volumes close to those you have described. Range slice is not very fast but doesn't really get much slower when increasing data size, apart from the overhead cause cassandra has to return more columns. However, the fastest way to query would be if you knew all the keys you want to query in advance.

Your setup has almost no downside, as you are not using supercolumns and have flat data structure, which is what Cassandra is good for, after all, it is a key-value store.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM