简体   繁体   中英

Not able to use mongo-hadoop connector in Hive query in HDP

I am new to hadoop. I have installed hortonworks sandbox 2.1. I am trying to execute Hive script using Hive UI. I want to access mongo collection inside Hive. I used following query for that:

CREATE TABLE individuals
( 
  id INT,
  name STRING,
  age INT,
  city STRING,
  hobby STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id"}')
TBLPROPERTIES('mongo.uri'='mongodb://<hostIP>:27017/test.test');

I have added mongo-java-driver-2.12.2.jar, mongo-hadoop-core-1.3.0.jar and mongo-hadoop-hive-1.3.0.jar as File Resources. But when I execute the query, it fails with following error:

15/03/11 04:38:24 INFO exec.DDLTask: Use StorageHandler-supplied com.mongodb.hadoop.hive.BSONSerDe for table individuals
15/03/11 04:38:24 ERROR exec.DDLTask: java.lang.NoClassDefFoundError: com/mongodb/util/JSON
    at com.mongodb.hadoop.hive.BSONSerDe.initialize(BSONSerDe.java:107)
    at org.apache.hadoop.hive.metastore.MetaStoreUtils.getDeserializer(MetaStoreUtils.java:339)
    at org.apache.hadoop.hive.ql.metadata.Table.getDeserializerFromMetaStore(Table.java:283)
    at org.apache.hadoop.hive.ql.metadata.Table.getDeserializer(Table.java:276)
    at org.apache.hadoop.hive.ql.metadata.Table.getCols(Table.java:626)
    at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:593)
    at org.apache.hadoop.hive.ql.exec.DDLTask.createTable(DDLTask.java:4194)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:281)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1504)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1271)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState.execute(BeeswaxServiceImpl.java:349)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:614)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1$1.run(BeeswaxServiceImpl.java:603)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:356)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1537)
    at com.cloudera.beeswax.BeeswaxServiceImpl$RunningQueryState$1.run(BeeswaxServiceImpl.java:603)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
    at java.util.concurrent.FutureTask.run(FutureTask.java:262)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)

Can somebody please help to let me know what I am missing over here?

Thanks in advance.

You need to map all the items in your mongodb collection not only the "_id" :

CREATE TABLE individuals
( 
  id INT,
  name STRING,
  age INT,
  city STRING,
  hobby STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"<corresponding name in your collection>", "age":"<same here>", etc...}')
TBLPROPERTIES('mongo.uri'='mongodb://<hostIP>:27017/test.test');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM