简体   繁体   中英

Why do we set hadoop_classpath for folder containing the jar is required to run?

I was trying to run WordCount program. I created wordcount.jar. Below is the content of my jar.

META-INF/<br>
META-INF/MANIFEST.MF<br>
org/myorg/WordCount.class<br>
org/myorg/WordCount$IntSumReducer.class<br>
org/myorg/WordCount$TokenizerMapper.class<br>

I ran the program using below command:

hadoop jar ./wordcount.jar org.myorg.                                                                                                                     WordCount mreduce/input mreduce/output

However I was getting below error:

java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.myorg.WordCount$TokenizerMapper not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1895)
        at org.apache.hadoop.mapreduce.task.JobContextImpl.getMapperClass(JobContextImpl.java:191)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:631)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:268)

But then I used export HADOOP_CLASSPATH=<folder where the jar was present>

And the issue was resolved. Can someone please explain this?

This is more of a configuration question rather than hadoop question. Your wordcount java code, needs hadoop jars and mapreduce jars(mostly client ones). So when you are running word count jar, your code needs a reference of hadoop jars and they are at HADOOP_CLASSPATH directory. Thats the reason you were able to run once path is set.

You system should have the classpath set as env variable.

Alternatively, you can include all the required jars inside your wordcount jar itself(fat jar).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM