简体   繁体   English

Hadoop中用于运行任务的Mapper进度

[英]Mapper progress in Hadoop for running task

I'm processing the zip files in Hadoop. 我正在处理Hadoop中的zip文件。 Each zip file contains 2000 XML files. 每个zip文件包含2000个XML文件。 A single mapper will take 90 to 60 min to complete the process . 单个映射器将花费90至60分钟来完成该过程 I'm using Windows and 6 core machine with 12 GB RAM . 我正在使用Windows和具有12 GB RAM的6核计算机。

My question is: My progress bar is showing only the result at the completion of the process. 我的问题是:我的进度条仅在过程完成时显示结果。 The progress status is being 0% until the completion of the task as below 进度状态为0%,直到完成任务为止 ,如下所示

在此处输入图片说明

How can I pragmatically change the progress value? 如何实用地更改进度值?

I tried the following code: 我尝试了以下代码:

InputDocXmlCount++;
if (InputDocXmlCount % 100 == 0)
{
    context.progress();
    runningJob.mapProgress();
}

But I don't know how to do this? 但是我不知道该怎么做? Can any one help me? 谁能帮我?

MR framework code can't decide how to show percentage because (i assume) you are using some specific InputFormat. MR框架代码无法决定如何显示百分比,因为(我假设)您正在使用某些特定的InputFormat。 Obviously, framework is not so clever to count amount of xml files in zip for you and predict that you will report progress once per 100 records. 显然,框架不是为您计算zip中的xml文件数量并预测每100条记录一次报告进度的聪明方法。

However, take a look at MR counters. 但是,请查看MR计数器。 You can, at least, count amount of xml files that you have already processed 您至少可以计算已经处理的xml文件数量

You don't have direct control of the progress value, but you could consider implementing a customized status message by calling TaskAttemptContext#setStatus from within your mapper code. 您无法直接控制进度值,但可以考虑通过从映射器代码中调用TaskAttemptContext#setStatus来实现自定义状态消息。 For example, you could make this a dynamic message including the count of XML files processed, and periodically update that count in the status string. 例如,您可以使该消息成为动态消息,其中包括处理的XML文件的数量,并定期更新状态字符串中的数量。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM