简体   繁体   中英

Cassandra bulk insert solution

I have a java program run as service , this program must insert 50k rows/s (1 row have 25 column ) to cassandra cluster.

My cluster contain 3 nodes, 1 node have 4 cpu core (core i5 2.4 ghz) , 4 gb ram.

i used Hector api, multithread, bulk insert but the performance is too low as expect (about 25k rows /s ).

Any one have suggest another solution for that. Is there cassandra support an internal bulk insert (without use Thrift).

Astyanax is a high level Java client for Apache Cassandra. Apache Cassandra is a highly available column oriented database. Astyanax is currently in use at Netflix. Issues generally are fixed as quickly as possbile and releases done frequently.

https://github.com/Netflix/astyanax

I've had good luck creating sstables and loading them directly. There is a sstableloader tool included in the distribution as well as a JMX interface. You can create the sstables using the SSTableSimpleUnsortedWriter class.

Details here .

The fastest way to bulk-insert data into Cassandra is sstableloader an utility provided by Cassandra in 0.8 onwards. For that you have to create sstables first which is possible with SSTableSimpleUnsortedWriter more about this is described here

Another faster way is Cassandras BulkoutputFormat for hadoop.With this we can write Hadoop job to load data to cassandra.See more on this bulkload to cassandra with hadoo

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM