简体   繁体   English

减速器可以将消息传递给Hadoop mapreduce中的驱动程序吗?

[英]Can a reducer pass a message to driver in Hadoop mapreduce?

I have to implement a loop of map-reduce jobs. 我必须实现一个map-reduce作业循环。 Each iteration will terminate or continue depending on the previous one. 每次迭代将终止或继续,具体取决于上一个迭代。 The choice to be made is based on "is one word appears in the reducer output". 选择基于“在减速器输出中是否出现一个字”。

Of course I can inspect the whole output txt file with my driver program. 当然,我可以使用驱动程序检查整个输出txt文件。 But it is just a single word and going through the whole file will overkill. 但这只是一个字,浏览整个文件会显得过分杀伤力。 I am considering is there any way to build the communication between reducer and the driver, the reducer can notify the driver once it detects the word? 我正在考虑有什么方法可以在减速器和驱动程序之间建立通信,减速器一旦检测到单词就可以通知驱动程序? Since the message to be transferred is few. 由于要传送的消息很少。

Your solution will be not a clean solution and hard to maintain. 您的解决方案将不是一个干净的解决方案,并且很难维护。

There are multiple ways to achieve what you have asked for . 有多种方法可以实现您的要求。

 1. Reducer as soon as it finds a word writes to a HDFS location (opens file on hdfs predefine filedir and writes there)
 2. client keeps polling the predefined filedir / output dir of the job. If the output dir is found and there is no filedir it means word wasnt there. 
 3. Use Zookeper

Best solution would be to , emit from mapper only if the word is found, else not emit anything. 最好的解决方案是to,仅在找到单词时才从mapper发出,否则不发出任何东西。 This will fasten your job and spawn a single reducer. 这样可以加快您的工作速度,并产生一个减速器。 Now you can safely check if the output of the job has any file on output or not. 现在,您可以安全地检查作业的输出中是否有任何文件。 Use Lazy initialization, in case no rows comes to reducer no output file would be created 使用惰性初始化,以防万一没有行进入减速器,则不会创建任何输出文件

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM