简体   繁体   中英

Custom map-reduce input formatter for Cassandra using native protocol

I am using Apache Cassandra (1.2) and Apache Map-Reduce to crunch some data. At the moment I use CqlPagingInputFormat from org.apache.cassandra.hadoop.cql3 . This provider uses Thrift to pull data. It seems that Thrift is fairly slow (300M records, in a 3 node cluster takes 8+ hours to read), and since a native binary protocol exist, I wonder if anyone has used it.

I am not interested in any other optimization and configuration tweaks - that's a separate issue.

My questions are

  1. Is there an implementation of a map-reduce input formatter that directly use Cassandra native protocol?

  2. If not, what would be the first steps to write my own, for example using a DataStax driver?

Cassandra 2.0.7 includes native protocol analogs for the CQL Hadoop classes:

org.apache.cassandra.hadoop.cql3.CqlInputFormat org.apache.cassandra.hadoop.cql3.CqlRecordReader org.apache.cassandra.hadoop.cql3.CqlConfigHelper

The WordCount code in examples/hadoop_cql3_word_count has been updated to use these classes.

The JIRA that introduced this is https://issues.apache.org/jira/browse/CASSANDRA-6311

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM