尝试从Java应用程序运行MapReduce作业时捕获到异常

Question

I need to invoke a mapreduce job from java application. 我需要从Java应用程序调用mapreduce作业。 I use 我用

ToolRunner.run(new Validation(), pathsMoveToFinal.toArray(new String[pathsMoveToFinal.size()]));

If I don't set conf's mapred.job.jobtracker, it runs like forever. 如果我不设置conf的mapred.job.jobtracker，它将永远运行。 The map task turns to 100% then go down back to other percentage. 地图任务变为100％，然后下降至其他百分比。 If I set mapred.job.jobtracker, it complains mapper class cannot be found: 如果我设置了mapred.job.jobtracker，它会抱怨找不到mapper类：

java.lang.RuntimeException: java.lang.ClassNotFoundException:  utils.DataValidationExtractorMapper
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:809)
at org.apache.hadoop.mapreduce.JobContext.getMapperClass(JobContext.java:157)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:569)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:305)
at org.apache.hadoop.mapred.Child.main(Child.java:170)
Caused by: java.lang.ClassNotFoundException: utils.DataValidationExtractorMapper
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:762)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:807)
... 4 more

Could anyone please give me some hint. 任何人都可以给我一些提示。 Thank you and have a good weekend. 谢谢，祝您周末愉快。

Answer 1

Since you're using Maven, I highly recommend baking your dependencies statically into your JAR . 由于您使用的是Maven，因此我强烈建议您将依赖项静态地烘焙到JAR中。

The reason this occurs is your Mapper and Reducer JREs have no pre-existing context of your client's class path. 发生这种情况的原因是您的Mapper和Reducer JRE没有客户端类路径的预先存在的上下文。 Baking in the dependencies is future-proof and stable, and Hadoop should work with this JAR quite happily. 依赖关系的烘焙是面向未来且稳定的，并且Hadoop应该很高兴与此JAR一起工作。

Answer 2

Please see my previous answer (and other answers) here: 请在这里查看我以前的答案（和其他答案）：

How to make a monolithic jar.file? 如何制作一个整体的jar.file？

then run with hadoop jar. 然后用hadoop jar运行。

Setting the classpath on shared/unowned boxes may be a big issue since the jar files have to be replicated to all the task servers. 由于必须将jar文件复制到所有任务服务器，因此在共享/未拥有的盒子上设置类路径可能是一个大问题。 Add one server, forget to set the classpath, ouch, my job breaks on some task machine but runs in others. 添加一台服务器，忘记设置类路径，哎呀，我的工作在某些任务机器上中断，但在其他任务机器上运行。 Try debugging that when you have 100 boxes! 当您有100个盒子时，尝试调试！ Monolithic jars will let you encapsulate all of your dependencies into one big distributable jar. 整体式jar将使您将所有依赖项封装到一个大的可分发jar中。

Answer 3

Sovled. 已解决。 It's not because of maven thing. 这不是因为行家的事情。 When i try to start mapreduce job from java code, I have to pack the mapreduce job in a jar. 当我尝试从Java代码启动mapreduce作业时，我必须将mapreduce作业打包在一个jar中。 Because hadoop was trying to copy the jar to different task jvms. 因为hadoop试图将jar复制到不同的任务jvm。 Thanks for all the suggestion! 感谢您的所有建议！

尝试从Java应用程序运行MapReduce作业时捕获到异常

问题描述

3 个解决方案

解决方案1
1 2012-03-26 22:53:24

解决方案2
1 2012-03-28 15:48:25

解决方案3
1 已采纳 2012-03-30 17:18:52

尝试从Java应用程序运行MapReduce作业时捕获到异常

问题描述

3 个解决方案

解决方案1 1 2012-03-26 22:53:24

解决方案2 1 2012-03-28 15:48:25

解决方案3 1 已采纳 2012-03-30 17:18:52

解决方案1
1 2012-03-26 22:53:24

解决方案2
1 2012-03-28 15:48:25

解决方案3
1 已采纳 2012-03-30 17:18:52