简体   繁体   English

在Giraph上做自己的工作

[英]Running my own job on Giraph

So, I've successfully executed the SimpleShortestPathComputation on my computer via the script shown here: 因此,我已经通过此处显示的脚本在计算机上成功执行了SimpleShortestPathComputation:

#VARIABLES
user_dir=/user/hduser
jar=giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.4.0-jar-with-dependencies.jar
runner=org.apache.giraph.GiraphRunner
computation=org.apache.giraph.examples.SimpleShortestPathsComputation
informat=org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
outformat=org.apache.giraph.io.formats.IdWithValueTextOutputFormat

#DELETE PREVIOUS
bin/hdfs dfs -rm -r $user_dir/output/shortestpaths

#GIRAPH JOB
bin/hadoop jar $GIRAPH_HOME/$jar $runner -Dgiraph.yarn.task.heap.mb=3000 $computation -vif $informat -vip $user_dir/input/tiny_graph.txt -vof $outformat -op $user_dir/output/shortestpaths -w 1

Now the problem is, I'm trying to run my own job. 现在的问题是,我正在尝试自己的工作。 It's actually a direct copy-paste from the SimpleShortestPathComputation class, only changed the package name and class name. 它实际上是SimpleShortestPathComputation类的直接复制粘贴,只是更改了包名和类名。 Trying to run it with -libjars. 尝试使用-libjars运行它。 Here's the full script: 这是完整的脚本:

#VARIABLES
user_dir=/user/hduser
jar=giraph-examples/target/giraph-examples-1.1.0-for-hadoop-2.4.0-jar-with-dependencies.jar
runner=org.apache.giraph.GiraphRunner
computation=org.apache.giraph.examples.SimpleShortestPathsComputation
informat=org.apache.giraph.io.formats.JsonLongDoubleFloatDoubleVertexInputFormat
outformat=org.apache.giraph.io.formats.IdWithValueTextOutputFormat
libjars=/usr/local/hadoop-2.4.0/lib/giraphtrials.jar,$GIRAPH_HOME/giraph-core.jar

#Setup class paths
export HADOOP_CLASSPATH=/usr/local/hadoop-2.4.0/lib/giraphtrials.jar:$GIRAPH_HOME/$jar:$HADOOP_CLASSPATH

#DELETE PREVIOUS
bin/hdfs dfs -rm -r $user_dir/output/shortestpaths

#GIRAPH JOB
bin/hadoop jar $GIRAPH_HOME/$jar $runner -libjars $libjars \
GiraphAlgs.GiraphPBFS -vif $informat -vip $user_dir/input/tiny_graph.txt \
-vof $outformat -op $user_dir/output/shortestpaths -w 1

As you can see, I've tried to use -libjars and HADOOP_CLASSPATH suggestion from this Stackoverflow question to make it work but unfortunately it still gives me a ClassNotFoundException. 如您所见,我尝试使用Stackoverflow 问题中的 -libjars和HADOOP_CLASSPATH建议使其工作,但不幸的是,它仍然给我ClassNotFoundException。 For better or worse, it doesn't throw it at me in the terminal anymore (it used to): terminal picture . 不管是好是坏,它不再在终端中扔给我了(它曾经是): terminal picture As you can see, it only fails with a general container message now. 如您所见,它现在仅以常规容器消息失败。

Unfortunately, it still gives me the Java.lang.ClassNotFoundException in the logs: log picture . 不幸的是,它仍然在日志中提供了Java.lang.ClassNotFoundException: log picture Using Hadoop 2.4.0 and Giraph 1.1.0. 使用Hadoop 2.4.0和Giraph 1.1.0。 I'm running out of ideas what might be wrong with my Giraph and starting to think if I should change careers. 我没想到我的Giraph可能出了什么问题,并开始考虑我是否应该改变职业。

You need to have the jar which contains the class GiraphAlgs.GiraphPBFS in the hadoop classpath. 您需要在hadoop类路径中包含包含类GiraphAlgs.GiraphPBFS的jar。

Also, verify that your classpath is correct set by running $bin/hadoop classpath . 另外,通过运行$bin/hadoop classpath验证您的类路径设置正确。

Once in hadoop 2.7 setting HADOOP_CLASSPATH variable didn't work, I had to copy the jar in the hadoop share lib directory: HADOOP_HOME/share/hadoop/mapreduce/lib . 在hadoop 2.7中,设置HADOOP_CLASSPATH变量不起作用后,我不得不将jar复制到hadoop共享库目录: HADOOP_HOME/share/hadoop/mapreduce/lib

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM