简体   繁体   中英

Hadoop Cassandra Pig - row count query runs slow with only 1 map task

I have a 4 node Cassandra cluster which is also a hadoop cluster

When I run pig script to select and count the rows of Cassandra table - it creates hadoop job with 1 map task - and it takes long time to complete that job.

Why hadoop is not creating multiple map jobs?

The most likely thing is that the splits generated by the hadoop input format are large enough that they cover your entire token range. Try shrinking your input split size so that more tasks will be created.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM