地图减少完成但猪工作失败

Question

I recently came across this scenario where a MapReduce job seems to be successful in RM where as the PIG script returned with an exit code 8 which refers to "Throwable thrown (an unexpected exception)" 我最近遇到过这样的情况，其中MapReduce作业似乎在RM中成功，其中PIG脚本返回时返回代码为8，引用“Throwable thrown（意外异常）”

Added the script as requested: 根据要求添加了脚本：

REGISTER '$LIB_LOCATION/*.jar'; 

-- set number of reducers to 200
SET default_parallel $REDUCERS;
SET mapreduce.map.memory.mb 3072;
SET mapreduce.reduce.memory.mb 6144;

SET mapreduce.map.java.opts -Xmx2560m;
SET mapreduce.reduce.java.opts -Xmx5120m;
SET mapreduce.job.queuename dt_pat_merchant;

SET yarn.app.mapreduce.am.command-opts -Xmx5120m;
SET yarn.app.mapreduce.am.resource.mb 6144;

-- load data from EAP data catalog using given ($ENV = PROD)
data = LOAD 'eap-$ENV://event'
-- using a custom function
USING com.XXXXXX.pig.DataDumpLoadFunc
('{"startDate": "$START_DATE", "endDate" : "$END_DATE", "timeType" : "$TIME_TYPE", "fileStreamType":"$FILESTREAM_TYPE", "attributes": { "all": "true" } }', '$MAPPING_XML_FILE_PATH');

-- filter out null context entity records
filtered = FILTER data BY (attributes#'context_id' IS NOT NULL);

-- group data by session id
session_groups = GROUP filtered BY attributes#'context_id';

-- flatten events
flattened_events = FOREACH session_groups GENERATE FLATTEN(filtered);

-- remove the output directory if exists
RMF $OUTPUT_PATH;

-- store results in specified output location
STORE flattened_events INTO '$OUTPUT_PATH' USING com.XXXX.data.catalog.pig.EventStoreFunc();

And I can see "ERROR 2998: Unhandled internal error. GC overhead limit exceeded" in the pig logs.(log below) 我可以在猪日志中看到“ERROR 2998：未处理的内部错误。超出GC开销限制”。（记录如下）

Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. GC overhead limit exceeded

java.lang.OutOfMemoryError: GC overhead limit exceeded
        at org.apache.hadoop.mapreduce.FileSystemCounter.values(FileSystemCounter.java:23)
        at org.apache.hadoop.mapreduce.counters.FileSystemCounterGroup.findCounter(FileSystemCounterGroup.java:219)
        at org.apache.hadoop.mapreduce.counters.FileSystemCounterGroup.findCounter(FileSystemCounterGroup.java:199)
        at org.apache.hadoop.mapreduce.counters.FileSystemCounterGroup.findCounter(FileSystemCounterGroup.java:210)
        at org.apache.hadoop.mapreduce.counters.AbstractCounters.findCounter(AbstractCounters.java:154)
        at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:241)
        at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:370)
        at org.apache.hadoop.mapreduce.TypeConverter.fromYarn(TypeConverter.java:391)
        at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskReports(ClientServiceDelegate.java:451)
        at org.apache.hadoop.mapred.YARNRunner.getTaskReports(YARNRunner.java:594)
        at org.apache.hadoop.mapreduce.Job$3.run(Job.java:545)
        at org.apache.hadoop.mapreduce.Job$3.run(Job.java:543)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
        at org.apache.hadoop.mapreduce.Job.getTaskReports(Job.java:543)
        at org.apache.pig.backend.hadoop.executionengine.shims.HadoopShims.getTaskReports(HadoopShims.java:235)
        at org.apache.pig.tools.pigstats.mapreduce.MRJobStats.addMapReduceStatistics(MRJobStats.java:352)
        at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.addSuccessJobStats(MRPigStatsUtil.java:233)
        at org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil.accumulateStats(MRPigStatsUtil.java:165)
        at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:360)
        at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.launchPig(HExecutionEngine.java:282)
        at org.apache.pig.PigServer.launchPlan(PigServer.java:1431)
        at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1416)
        at org.apache.pig.PigServer.execute(PigServer.java:1405)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:456)
        at org.apache.pig.PigServer.executeBatch(PigServer.java:439)
        at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:171)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:234)
        at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
        at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:81)
        at org.apache.pig.Main.run(Main.java:624)

Configuration in the pig script looks like below: pig脚本中的配置如下所示：

SET default_parallel 200;
SET mapreduce.map.memory.mb 3072;
SET mapreduce.reduce.memory.mb 6144;

SET mapreduce.map.java.opts -Xmx2560m;
SET mapreduce.reduce.java.opts -Xmx5120m;
SET mapreduce.job.queuename dt_pat_merchant;

SET yarn.app.mapreduce.am.command-opts -Xmx5120m;
SET yarn.app.mapreduce.am.resource.mb 6144;

Status of the Job in the RM of the Cluster says the job succeeded [can't post the image as my reputation is too low ;) ] 作业在群集RM中的状态表示作业成功[无法发布图像，因为我的声誉太低;）]

This issue occurs frequently and we have to restart the job the job successful. 此问题经常发生，我们必须重新启动作业成功的工作。

Please let me know a fix for this. 请让我知道一个解决方案。

PS: The cluster the job is running is one of the biggest in the world, so no worry with resources or the storage space I say. PS：作业运行的集群是世界上最大的集群之一，所以不用担心我说的资源或存储空间。

Thanks 谢谢

Answer 1

From oracle docs : 来自oracle docs ：

After a garbage collection, if the Java process is spending more than approximately 98% of its time doing garbage collection and if it is recovering less than 2% of the heap and has been doing so far the last 5 (compile time constant) consecutive garbage collections, then a java.lang.OutOfMemoryError is thrown The java.lang.OutOfMemoryError exception for GC Overhead limit exceeded can be turned off with the command line flag -XX:-UseGCOverheadLimit 在垃圾收集之后，如果Java进程花费超过大约98％的时间进行垃圾收集，并且它正在恢复少于2％的堆并且到目前为止已经执行了最后5个（编译时常量）连续垃圾集合，然后抛出java.lang.OutOfMemoryError可以使用命令行标志-XX关闭超出GC Overhead限制的java.lang.OutOfMemoryError异常：-UseGCOverheadLimit

As said in docs, you can turn this exception off or increase heap size. 如文档中所述，您可以关闭此异常或增加堆大小。

Answer 2

Can you add your pig script here? 你能在这里添加猪脚吗？

I think, you get this error because the pig itself (not mappers and reducers) can't handle the output. 我想，你得到这个错误，因为猪本身（不是映射器和减速器）无法处理输出。 If you use DUMP operation it your script, then try to limit the displayed dataset first. 如果您使用DUMP操作它的脚本，然后尝试首先限制显示的数据集。 Let's assume, you have a X alias for your data. 假设您拥有数据的X别名。 Try: 尝试：

temp = LIMIT X 1;
DUMP temp;

Thus, you will see only one record and save some resources. 因此，您将只看到一条记录并节省一些资源。 You can do a STORE operation as well (see in pig manual how to do it). 您也可以进行STORE操作（参见猪手册中的操作方法）。

Obviously, you can configure pig's heap size to be bigger, but pig's heap size is not mapreduce.map or mapreduce.reduce . 显然，您可以将pig的堆大小配置为更大，但pig的堆大小不是mapreduce.map或mapreduce.reduce 。 Use PIG_HEAPSIZE environment variable to do that. 使用PIG_HEAPSIZE环境变量来做到这一点。

地图减少完成但猪工作失败

问题描述

2 个解决方案

解决方案1
0 2017-04-24 10:15:14

解决方案2
0 已采纳 2017-04-24 16:26:31

地图减少完成但猪工作失败

问题描述

2 个解决方案

解决方案1 0 2017-04-24 10:15:14

解决方案2 0 已采纳 2017-04-24 16:26:31

解决方案1
0 2017-04-24 10:15:14

解决方案2
0 已采纳 2017-04-24 16:26:31