简体   繁体   English

实施Hadoop和MongoDB连接器

[英]Implementing the Hadoop and MongoDB connector

I am working with Hadoop for the very first time, since I am planning to use it with MongoDB. 我第一次使用Hadoop,因为我计划将其用于MongoDB。 After installing Hadoop, I tried to to follow this tutorial and implement its example http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/ 安装Hadoop之后,我尝试遵循本教程并实现其示例http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/

Everything works until I call this command 一切正常,直到我调用此命令

bash examples/treasury_yield/run_job.sh

Then I get the following message 然后我得到以下消息

14/03/11 17:52:45 INFO util.MongoTool: Created a conf: 'Configuration: core-defa
ult.xml, core-site.xml, src/examples/hadoop-local.xml, src/examples/mongo-defaul
ts.xml' on {class com.mongodb.hadoop.examples.treasury.TreasuryYieldXMLConfig} a
s job named '<unnamed MongoTool job>'
14/03/11 17:52:46 INFO util.MongoTool: Mapper Class: class com.mongodb.hadoop.ex
amples.treasury.TreasuryYieldMapper
14/03/11 17:52:46 INFO util.MongoTool: Setting up and running MapReduce job in f
oreground, will wait for results.  {Verbose? true}
14/03/11 17:52:47 WARN fs.FileSystem: "localhost:9100" is a deprecated filesyste
m name. Use "hdfs://localhost:9100/" instead.
14/03/11 17:52:47 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop
.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-goncalopereira/mapre
d/staging/goncalopereira/.staging/job_201403111752_0001/job.jar could only be re
plicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
nvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
ionHandler.java:62)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
(DFSClient.java:3686)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStrea
m(DFSClient.java:3546)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClien
t.java:2749)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2989)

14/03/11 17:52:47 WARN hdfs.DFSClient: Error Recovery for block null bad datanod
e[0] nodes == null
14/03/11 17:52:47 WARN hdfs.DFSClient: Could not get block locations. Source fil
e "/tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job_2014031
11752_0001/job.jar" - Aborting...
14/03/11 17:52:47 INFO mapred.JobClient: Cleaning up the staging area hdfs://loc
alhost:9100/tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job
_201403111752_0001
14/03/11 17:52:47 ERROR security.UserGroupInformation: PriviledgedActionExceptio
n as:goncalopereira cause:org.apache.hadoop.ipc.RemoteException: java.io.IOExcep
tion: File /tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job
_201403111752_0001/job.jar could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

14/03/11 17:52:47 ERROR util.MongoTool: Exception while executing job...
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-gon
calopereira/mapred/staging/goncalopereira/.staging/job_201403111752_0001/job.jar
 could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
nvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
ionHandler.java:62)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
(DFSClient.java:3686)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStrea
m(DFSClient.java:3546)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClien
t.java:2749)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2989)
14/03/11 17:52:47 ERROR hdfs.DFSClient: Failed to close file /tmp/hadoop-goncalo
pereira/mapred/staging/goncalopereira/.staging/job_201403111752_0001/job.jar
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-gon
calopereira/mapred/staging/goncalopereira/.staging/job_201403111752_0001/job.jar
 could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
nvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
ionHandler.java:62)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
(DFSClient.java:3686)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStrea
m(DFSClient.java:3546)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClien
t.java:2749)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2989)

As you can guess this is a bit overwhelming to a newbie like me. 如您所料,这对于像我这样的新手来说有点不知所措。 I assume it's some problem with Hadoop but not entirely sure what. 我认为这是Hadoop的问题,但不能完全确定是什么问题。 I really wish someone here could point me in the right direction. 我真的希望这里有人可以指出正确的方向。

Hi i have connected hadoop with mongodb using mongoDBConnector using this link 嗨,我已使用mongoDBConnector通过此链接将hadoop与mongodb连接

hadoop connection with mongodb hadoop与mongodb的连接

You need to concentrate on this error: 您需要专注于此错误:

ERROR security.UserGroupInformation: PriviledgedActionExceptio n as:goncalopereira cause:org.apache.hadoop.ipc.RemoteException: java.io.IOExcep tion: File /tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job _201403111752_0001/job.jar could only be replicated to 0 nodes, instead of 1

  1. Check if that jar is present on the path. 检查路径上是否存在该jar。

  2. Check if you are starting you dataNode because it takes time to start. 检查是否要启动dataNode,因为它需要花费一些时间才能启动。

Make sure your hadoop is installed correctly and try running a sample dataset just for hadoop without bringing MangoDB into picture. 确保您的hadoop已正确安装,并尝试仅针对hadoop运行示例数据集,而不必将MangoDB变成图片。 This will differentiate where things are going wrong. 这将区分出问题所在。 Hope it helps. 希望能帮助到你。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM