简体繁体中英

Custom map-reduce input formatter for Cassandra using native protocol

原文 2014-04-21 12:45:22 2 1 java/ hadoop/ mapreduce/ cassandra/ datastax-java-driver

I am using Apache Cassandra (1.2) and Apache Map-Reduce to crunch some data. At the moment I use CqlPagingInputFormat from org.apache.cassandra.hadoop.cql3 . This provider uses Thrift to pull data. It seems that Thrift is fairly slow (300M records, in a 3 node cluster takes 8+ hours to read), and since a native binary protocol exist, I wonder if anyone has used it.

I am not interested in any other optimization and configuration tweaks - that's a separate issue.

My questions are

Is there an implementation of a map-reduce input formatter that directly use Cassandra native protocol?
If not, what would be the first steps to write my own, for example using a DataStax driver?

1 answers

Cassandra 2.0.7 includes native protocol analogs for the CQL Hadoop classes:

org.apache.cassandra.hadoop.cql3.CqlInputFormat org.apache.cassandra.hadoop.cql3.CqlRecordReader org.apache.cassandra.hadoop.cql3.CqlConfigHelper

The WordCount code in examples/hadoop_cql3_word_count has been updated to use these classes.

The JIRA that introduced this is https://issues.apache.org/jira/browse/CASSANDRA-6311

MongoDB query using map-reduce/Aggregation?

Shuffling in Map-Reduce

Add input data on the fly to Hadoop Map-Reduce Job?

Can I write custom java methods inside map-reduce?

How to pass multiple input format files to map-reduce job?

Map-Reduce Programming Error

Secondary sorting in Map-Reduce

Hadoop Map-Reduce . RecordReader

Map-reduce Instantiation Exception

Strip non-printable characters using Hadoop Map-Reduce

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question MongoDB query using map-reduce/Aggregation? Shuffling in Map-Reduce Add input data on the fly to Hadoop Map-Reduce Job? Can I write custom java methods inside map-reduce? How to pass multiple input format files to map-reduce job? Map-Reduce Programming Error Secondary sorting in Map-Reduce Hadoop Map-Reduce . RecordReader Map-reduce Instantiation Exception Strip non-printable characters using Hadoop Map-Reduce

Related Tags

Custom map-reduce input formatter for Cassandra using native protocol

Question

1 answers

solution1 1 2014-04-24 09:47:24

solution1
1 2014-04-24 09:47:24