简体   繁体   中英

Implementing the Hadoop and MongoDB connector

I am working with Hadoop for the very first time, since I am planning to use it with MongoDB. After installing Hadoop, I tried to to follow this tutorial and implement its example http://docs.mongodb.org/ecosystem/tutorial/getting-started-with-hadoop/

Everything works until I call this command

bash examples/treasury_yield/run_job.sh

Then I get the following message

14/03/11 17:52:45 INFO util.MongoTool: Created a conf: 'Configuration: core-defa
ult.xml, core-site.xml, src/examples/hadoop-local.xml, src/examples/mongo-defaul
ts.xml' on {class com.mongodb.hadoop.examples.treasury.TreasuryYieldXMLConfig} a
s job named '<unnamed MongoTool job>'
14/03/11 17:52:46 INFO util.MongoTool: Mapper Class: class com.mongodb.hadoop.ex
amples.treasury.TreasuryYieldMapper
14/03/11 17:52:46 INFO util.MongoTool: Setting up and running MapReduce job in f
oreground, will wait for results.  {Verbose? true}
14/03/11 17:52:47 WARN fs.FileSystem: "localhost:9100" is a deprecated filesyste
m name. Use "hdfs://localhost:9100/" instead.
14/03/11 17:52:47 WARN hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop
.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-goncalopereira/mapre
d/staging/goncalopereira/.staging/job_201403111752_0001/job.jar could only be re
plicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
nvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
ionHandler.java:62)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
(DFSClient.java:3686)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStrea
m(DFSClient.java:3546)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClien
t.java:2749)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2989)

14/03/11 17:52:47 WARN hdfs.DFSClient: Error Recovery for block null bad datanod
e[0] nodes == null
14/03/11 17:52:47 WARN hdfs.DFSClient: Could not get block locations. Source fil
e "/tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job_2014031
11752_0001/job.jar" - Aborting...
14/03/11 17:52:47 INFO mapred.JobClient: Cleaning up the staging area hdfs://loc
alhost:9100/tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job
_201403111752_0001
14/03/11 17:52:47 ERROR security.UserGroupInformation: PriviledgedActionExceptio
n as:goncalopereira cause:org.apache.hadoop.ipc.RemoteException: java.io.IOExcep
tion: File /tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job
_201403111752_0001/job.jar could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

14/03/11 17:52:47 ERROR util.MongoTool: Exception while executing job...
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-gon
calopereira/mapred/staging/goncalopereira/.staging/job_201403111752_0001/job.jar
 could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
nvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
ionHandler.java:62)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
(DFSClient.java:3686)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStrea
m(DFSClient.java:3546)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClien
t.java:2749)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2989)
14/03/11 17:52:47 ERROR hdfs.DFSClient: Failed to close file /tmp/hadoop-goncalo
pereira/mapred/staging/goncalopereira/.staging/job_201403111752_0001/job.jar
org.apache.hadoop.ipc.RemoteException: java.io.IOException: File /tmp/hadoop-gon
calopereira/mapred/staging/goncalopereira/.staging/job_201403111752_0001/job.jar
 could only be replicated to 0 nodes, instead of 1
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBloc
k(FSNamesystem.java:1639)
        at org.apache.hadoop.hdfs.server.namenode.NameNode.addBlock(NameNode.jav
a:736)
        at sun.reflect.GeneratedMethodAccessor6.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:578)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1393)
        at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1389)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInforma
tion.java:1149)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1387)

        at org.apache.hadoop.ipc.Client.call(Client.java:1107)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:229)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.
java:57)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAcces
sorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryI
nvocationHandler.java:85)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocat
ionHandler.java:62)
        at com.sun.proxy.$Proxy2.addBlock(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.locateFollowingBlock
(DFSClient.java:3686)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStrea
m(DFSClient.java:3546)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2600(DFSClien
t.java:2749)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFS
Client.java:2989)

As you can guess this is a bit overwhelming to a newbie like me. I assume it's some problem with Hadoop but not entirely sure what. I really wish someone here could point me in the right direction.

Hi i have connected hadoop with mongodb using mongoDBConnector using this link

hadoop connection with mongodb

You need to concentrate on this error:

ERROR security.UserGroupInformation: PriviledgedActionExceptio n as:goncalopereira cause:org.apache.hadoop.ipc.RemoteException: java.io.IOExcep tion: File /tmp/hadoop-goncalopereira/mapred/staging/goncalopereira/.staging/job _201403111752_0001/job.jar could only be replicated to 0 nodes, instead of 1

  1. Check if that jar is present on the path.

  2. Check if you are starting you dataNode because it takes time to start.

Make sure your hadoop is installed correctly and try running a sample dataset just for hadoop without bringing MangoDB into picture. This will differentiate where things are going wrong. Hope it helps.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM