简体   繁体   中英

Storing HDFS data into MongoDB using Pig

I am new to Hadoop and having a requirement to store the Hadoop data into MongoDB. Here I am using Pig to store the data in Hadoop into MongoDB.

I downloaded and registered the following drivers to do this in Pig Grunt shell with the help of given command,

REGISTER /home/miracle/Downloads/mongo-hadoop-pig-2.0.2.jar
REGISTER /home/miracle/Downloads/mongo-java-driver-3.4.2.jar
REGISTER /home/miracle/Downloads/mongo-hadoop-core-2.0.2.jar

After this I successfully got the data from MongoDB using the following command.

raw = LOAD 'mongodb://localhost:27017/pavan.pavan.in' USING com.mongodb.hadoop.pig.MongoLoader;

Then I tried the following command to insert the data from pig bag to MongoDB and got succeeded.

STORE student INTO 'mongodb://localhost:27017/pavan.persons_info' USING com.mongodb.hadoop.pig.MongoInsertStorage('','');

Then I am trying the Mongo Update using the below command.

STORE student INTO 'mongodb://localhost:27017/pavan.persons_info1' USING com.mongodb.hadoop.pig.MongoUpdateStorage(' ','{first:"\$firstname", last:"\$lastname", phone:"\$phone", city:"\$city"}','firstname: chararray,lastname: chararray,phone: chararray,city: chararray');

But I am getting the below error while performing the above command.

 2017-03-22 11:16:42,516 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:43,064 [main] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:43,180 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2017-03-22 11:16:43,306 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:43,308 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2017-03-22 11:16:43,309 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2017-03-22 11:16:43,310 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2017-03-22 11:16:43,314 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2017-03-22 11:16:43,314 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2017-03-22 11:16:43,415 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:43,419 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:43,423 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2017-03-22 11:16:43,425 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2017-03-22 11:16:43,438 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2017-03-22 11:16:43,603 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-java-driver-3.0.4.jar to DistributedCache through /tmp/temp159471787/tmp643027494/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:43,687 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-hadoop-core-2.0.2.jar to DistributedCache through /tmp/temp159471787/tmp-1745369112/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:43,822 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-hadoop-pig-2.0.2.jar to DistributedCache through /tmp/temp159471787/tmp116725398/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:44,693 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/pig-0.16.0-core-h2.jar to DistributedCache through /tmp/temp159471787/tmp499355324/pig-0.16.0-core-h2.jar 2017-03-22 11:16:44,762 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp159471787/tmp413788756/automaton-1.11-8.jar 2017-03-22 11:16:44,830 [DataStreamer for file /tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar block BP-1303579226-127.0.1.1-1489750707340:blk_1073742392_1568] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1249) at java.lang.Thread.join(Thread.java:1323) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeInternal(DFSOutputStream.java:577) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:573) 2017-03-22 11:16:44,856 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar 2017-03-22 11:16:44,960 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp159471787/tmp1163422388/joda-time-2.9.3.jar 2017-03-22 11:16:44,996 [main] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:45,004 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2017-03-22 11:16:45,147 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2017-03-22 11:16:45,166 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:45,253 [JobControl] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:45,318 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2017-03-22 11:16:45,572 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 2017-03-22 11:16:45,579 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2017-03-22 11:16:45,581 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2017-03-22 11:16:45,593 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2017-03-22 11:16:45,690 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2017-03-22 11:16:45,884 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local1070093788_0006 2017-03-22 11:16:47,476 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar <- /home/miracle/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:47,534 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp643027494/mongo-java-driver-3.0.4.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:47,534 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar <- /home/miracle/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:47,674 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp-1745369112/mongo-hadoop-core-2.0.2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:48,194 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar <- /home/miracle/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:48,201 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp116725398/mongo-hadoop-pig-2.0.2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:48,329 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar <- /home/miracle/pig-0.16.0-core-h2.jar 2017-03-22 11:16:48,337 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp499355324/pig-0.16.0-core-h2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar 2017-03-22 11:16:48,338 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar <- /home/miracle/automaton-1.11-8.jar 2017-03-22 11:16:48,370 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp413788756/automaton-1.11-8.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar 2017-03-22 11:16:48,371 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar <- /home/miracle/antlr-runtime-3.4.jar 2017-03-22 11:16:48,384 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar 2017-03-22 11:16:48,389 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar <- /home/miracle/joda-time-2.9.3.jar 2017-03-22 11:16:48,409 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp1163422388/joda-time-2.9.3.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar 2017-03-22 11:16:48,798 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:48,803 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:48,803 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:48,804 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar 2017-03-22 11:16:48,806 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar 2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar 2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar 2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/ 2017-03-22 11:16:48,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1070093788_0006 2017-03-22 11:16:48,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases student1 2017-03-22 11:16:48,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: student1[7,11] C: R: 2017-03-22 11:16:48,889 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null 2017-03-22 11:16:48,915 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2017-03-22 11:16:48,915 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local1070093788_0006] 2017-03-22 11:16:48,999 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,011 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2017-03-22 11:16:49,013 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2017-03-22 11:16:49,013 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:49,054 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter 2017-03-22 11:16:49,094 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,104 [Thread-455] INFO com.mongodb.hadoop.output.MongoOutputCommitter - Setting up job. 2017-03-22 11:16:49,126 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks 2017-03-22 11:16:49,127 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1070093788_0006_m_000000_0 2017-03-22 11:16:49,253 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,279 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,290 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.output.MongoOutputCommitter - Setting up task. 2017-03-22 11:16:49,296 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ] 2017-03-22 11:16:49,340 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1 Total Length = 212 Input split[0]: Length = 212 ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit Locations: ----------------------- 2017-03-22 11:16:49,415 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 2017-03-22 11:16:49,417 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed hdfs://localhost:9000/input/student_dir/student_Info.txt:0+212 2017-03-22 11:16:49,459 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,684 [LocalJobRunner Map Task Executor #0] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500} 2017-03-22 11:16:50,484 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.output.MongoRecordWriter - Writing to temporary file: /tmp/hadoop-miracle/attempt_local1070093788_0006_m_000000_0/_MONGO_OUT_TEMP/_out 2017-03-22 11:16:50,516 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Preparing to write to com.mongodb.hadoop.output.MongoRecordWriter@1fd6ae6 2017-03-22 11:16:50,736 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (Tenured Gen) of size 699072512 to monitor. collectionUsageThreshold = 489350752, usageThreshold = 489350752 2017-03-22 11:16:50,739 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2017-03-22 11:16:50,746 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: student1[7,11] C: R: 2017-03-22 11:16:50,880 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete. 2017-03-22 11:16:50,919 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:50,963 [Thread-455] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1070093788_0006 java.lang.Exception: java.io.IOException: java.io.IOException: Couldn't convert tuple to bson: at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.io.IOException: java.io.IOException: Couldn't convert tuple to bson: at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:83) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:144) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:261) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Couldn't convert tuple to bson: at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:165) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75) ... 17 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:117) at com.mongodb.hadoop.pig.JSONPigReplace.substitute(JSONPigReplace.java:120) at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:142) ... 18 more 2017-03-22 11:16:53,944 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2017-03-22 11:16:53,944 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1070093788_0006 has failed! Stop running all dependent jobs 2017-03-22 11:16:53,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2017-03-22 11:16:53,949 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:53,954 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:53,962 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2017-03-22 11:16:53,981 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.7.3 0.16.0 miracle 2017-03-22 11:16:43 2017-03-22 11:16:53 UNKNOWN Failed! Failed Jobs: JobId Alias Feature Message Outputs job_local1070093788_0006 student1 MAP_ONLY Message: Job failed! mongodb://localhost:27017/pavan.persons_info1, Input(s): Failed to read data from "hdfs://localhost:9000/input/student_dir/student_Info.txt" Output(s): Failed to produce result in "mongodb://localhost:27017/pavan.persons_info1" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_local1070093788_0006 2017-03-22 11:16:53,983 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2017-03-22 11:16:54,004 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias student1 Details at logfile: /home/miracle/pig_1490205716326.log 

Here is the input that I have dumped here,

Input(s):

Successfully read 6 records (5378419 bytes) from: "hdfs://localhost:9000/input/student_dir/student_Info.txt"

Output(s): Successfully stored 6 records (5378449 bytes) in: "hdfs://localhost:9000/tmp/temp-1419179625/tmp882976412"

Counters: Total records written : 6 Total bytes written : 5378449 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0

Job DAG: job_local1866034015_0001

2017-03-23 02:43:37,677 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-23 02:43:37,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-23 02:43:37,689 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-23 02:43:37,736 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2017-03-23 02:43:37,748 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-23 02:43:37,751 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2017-03-23 02:43:37,793 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2017-03-23 02:43:37,793 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (Rajiv,Reddy,9848022337,Hyderabad) (siddarth,Battacharya,9848022338,Kolkata) (Rajesh,Khanna,9848022339,Delhi) (Preethi,Agarwal,9848022330,Pune) (Trupthi,Mohanthy,9848022336,Bhuwaneshwar) (Archana,Mishra,9848022335,Chennai.)

I don't know What to do next please give me any suggestions on this.

I do not use Mongo , but from HBase experience and From the error it looks like few fields are not flatten 'ed enough for it to fit the column family. And also few of columns you are trying to STORE don't exist. See to DUMP the data instead of STORE and check if it is matching the STORE structure you have applied.

Caused by: java.io.IOException: Couldn't convert tuple to bson: at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:165) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75) ... 17 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:117) at com.mongodb.hadoop.pig.JSONPigReplace.substitute(JSONPigReplace.java:120) at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:142) ... 18 more

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM