简体繁体中英

Hadoop word count

原文 2016-10-05 19:02:08 4 2 java/ hadoop/ word-count

For the word count example in Hadoop, in the map function, it write out the word and one to files as intermediate result and use the reduce to do the sum. Why not use a hashmap in the mapper function, which the key is word and the value is the count, if one word occurs more than once in 1 file spit, the value for the word will be added. in the end of the mapper function, write out the result.

In this way, it is more efficient than the original design(without using combiner), although using combiner, the efficiency should be equal.

Any advice?

2 answers

Yes, you can use hashmap as well. But you need to consider worst case scenarios while designing your solution.

Normally, the size of the block is 128 MB and consider that there small words(in terms of word length) with no or very less repetitions. In this case, you will have many words and thus no. of entries in HashMap will increase, consuming much more amount of memory. You need to take into account that there could be many different jobs operating on the same data node, so this HashMap consuming more amount of RAM will eventually slow down other jobs as well. Also, when the size of the HashMap gets increasing, it has to perform Rehashing which adds more time for your job execution.

我知道这是一篇旧帖子，但对于将来寻求 Hadoop 帮助的人，也许可以查看此问题以获取另一个参考： Hadoop 字数：接收以字母“c”开头的单词总数

Word count Hadoop

Running word count on hadoop

Cannot run word count on hadoop

Hadoop Mapreduce word count Program

Apache Hadoop Word Count error

Hadoop - word count per node

Hadoop word count example without root permissions

Hadoop word count example - null pointer exception

map 100% reduce 0% in running hadoop word count

Error - Hadoop Word Count Program in MapReduce

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Word count Hadoop Running word count on hadoop Cannot run word count on hadoop Hadoop Mapreduce word count Program Apache Hadoop Word Count error Hadoop - word count per node Hadoop word count example without root permissions Hadoop word count example - null pointer exception map 100% reduce 0% in running hadoop word count Error - Hadoop Word Count Program in MapReduce

Related Tags

Hadoop word count

Question

2 answers

solution1
1 2016-10-05 19:50:08

solution2
0 2019-10-07 20:58:50

Hadoop word count

Question

2 answers

solution1 1 2016-10-05 19:50:08

solution2 0 2019-10-07 20:58:50

solution1
1 2016-10-05 19:50:08

solution2
0 2019-10-07 20:58:50