简体   繁体   English

hadoop将映射器,分区器和组合器的输出文件存储在哪里?

[英]where does hadoop store the output files of mapper, partitioner and combiner?

I am running mapreduce jobs on a pseudo distributed hadoop setup. 我在伪分布式hadoop设置上运行mapreduce作业。 Where do i find the output file of mapper, partitioner and combiner? 在哪里可以找到映射器,分区器和组合器的输出文件? Is there a way to inspect the output of each operations? 有没有一种方法可以检查每个操作的输出?

Intermediate output in MapReduce is stored in local temp storage on the node in which the task ran (not in HDFS). MapReduce中的中间输出存储在任务运行所在节点上的本地临时存储中(而不是HDFS中)。

You can look up in you Hadoop conf where the local temp directories are and go manually inspect them node-by-node. 您可以在Hadoop conf中查找本地临时目录所在的位置,然后逐节点手动检查它们。

In general, there might be better ways of doing what you think you want to be doing through log messages or counters. 通常,可能会有更好的方法通过日志消息或计数器来完成您认为想要做的事情。 The other thing you can do is turn off reducers so that your Mappers write directly to HDFS so you can inspect that. 您可以做的另一件事是关闭reducers,以便Mappers直接将其写入HDFS,以便您可以进行检查。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM