简体繁体中英

Hadoop Cassandra Pig - row count query runs slow with only 1 map task

原文 2015-02-17 20:08:58 2 1 hadoop/ cassandra/ apache-pig

I have a 4 node Cassandra cluster which is also a hadoop cluster

When I run pig script to select and count the rows of Cassandra table - it creates hadoop job with 1 map task - and it takes long time to complete that job.

Why hadoop is not creating multiple map jobs?

1 answers

The most likely thing is that the splits generated by the hadoop input format are large enough that they cover your entire token range. Try shrinking your input split size so that more tasks will be created.

Cassandra and Pig integration - Is hadoop optional?

Hadoop Pig count number

Hadoop Pig - Optimizing Word Count

How to integrate pig with cassandra on hadoop 2.2.0?

hadoop map task timeout

Count and find maximum number in Hadoop using pig

Output a row to Cassandra in Hadoop Mapreduce

Can there be a scenario in hadoop where there'll be only 1 map task and 0 reduce tasks?

Why only 1 map and 1 reduce task and 1 node is used in a Hadoop job?

hadoop cluster: map task run only on one machine and not all

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Cassandra and Pig integration - Is hadoop optional? Hadoop Pig count number Hadoop Pig - Optimizing Word Count How to integrate pig with cassandra on hadoop 2.2.0? hadoop map task timeout Count and find maximum number in Hadoop using pig Output a row to Cassandra in Hadoop Mapreduce Can there be a scenario in hadoop where there'll be only 1 map task and 0 reduce tasks? Why only 1 map and 1 reduce task and 1 node is used in a Hadoop job? hadoop cluster: map task run only on one machine and not all

Related Tags

Hadoop Cassandra Pig - row count query runs slow with only 1 map task

Question

1 answers

solution1 0 2015-02-18 01:07:34

solution1
0 2015-02-18 01:07:34