简体   繁体   English

使用Pig将HDFS数据存储到MongoDB中

[英]Storing HDFS data into MongoDB using Pig

I am new to Hadoop and having a requirement to store the Hadoop data into MongoDB. 我是Hadoop的新手,需要将Hadoop数据存储到MongoDB中。 Here I am using Pig to store the data in Hadoop into MongoDB. 在这里,我使用Pig将Hadoop中的数据存储到MongoDB中。

I downloaded and registered the following drivers to do this in Pig Grunt shell with the help of given command, 我已经下载并注册了以下驱动程序,以便在给定命令的帮助下在Pig Grunt shell中执行此操作,

REGISTER /home/miracle/Downloads/mongo-hadoop-pig-2.0.2.jar
REGISTER /home/miracle/Downloads/mongo-java-driver-3.4.2.jar
REGISTER /home/miracle/Downloads/mongo-hadoop-core-2.0.2.jar

After this I successfully got the data from MongoDB using the following command. 在此之后,我使用以下命令成功从MongoDB获取数据。

raw = LOAD 'mongodb://localhost:27017/pavan.pavan.in' USING com.mongodb.hadoop.pig.MongoLoader;

Then I tried the following command to insert the data from pig bag to MongoDB and got succeeded. 然后我尝试使用以下命令将pig bag中的数据插入MongoDB并成功完成。

STORE student INTO 'mongodb://localhost:27017/pavan.persons_info' USING com.mongodb.hadoop.pig.MongoInsertStorage('','');

Then I am trying the Mongo Update using the below command. 然后我使用下面的命令尝试Mongo Update。

STORE student INTO 'mongodb://localhost:27017/pavan.persons_info1' USING com.mongodb.hadoop.pig.MongoUpdateStorage(' ','{first:"\$firstname", last:"\$lastname", phone:"\$phone", city:"\$city"}','firstname: chararray,lastname: chararray,phone: chararray,city: chararray');

But I am getting the below error while performing the above command. 但是在执行上述命令时,我收到以下错误。

 2017-03-22 11:16:42,516 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:43,064 [main] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:43,180 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2017-03-22 11:16:43,306 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:43,308 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2017-03-22 11:16:43,309 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2017-03-22 11:16:43,310 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2017-03-22 11:16:43,314 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2017-03-22 11:16:43,314 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2017-03-22 11:16:43,415 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:43,419 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:43,423 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2017-03-22 11:16:43,425 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2017-03-22 11:16:43,438 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2017-03-22 11:16:43,603 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-java-driver-3.0.4.jar to DistributedCache through /tmp/temp159471787/tmp643027494/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:43,687 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-hadoop-core-2.0.2.jar to DistributedCache through /tmp/temp159471787/tmp-1745369112/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:43,822 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-hadoop-pig-2.0.2.jar to DistributedCache through /tmp/temp159471787/tmp116725398/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:44,693 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/pig-0.16.0-core-h2.jar to DistributedCache through /tmp/temp159471787/tmp499355324/pig-0.16.0-core-h2.jar 2017-03-22 11:16:44,762 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp159471787/tmp413788756/automaton-1.11-8.jar 2017-03-22 11:16:44,830 [DataStreamer for file /tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar block BP-1303579226-127.0.1.1-1489750707340:blk_1073742392_1568] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1249) at java.lang.Thread.join(Thread.java:1323) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeInternal(DFSOutputStream.java:577) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:573) 2017-03-22 11:16:44,856 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar 2017-03-22 11:16:44,960 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp159471787/tmp1163422388/joda-time-2.9.3.jar 2017-03-22 11:16:44,996 [main] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:45,004 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2017-03-22 11:16:45,147 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2017-03-22 11:16:45,166 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:45,253 [JobControl] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:45,318 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2017-03-22 11:16:45,572 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 2017-03-22 11:16:45,579 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2017-03-22 11:16:45,581 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2017-03-22 11:16:45,593 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2017-03-22 11:16:45,690 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2017-03-22 11:16:45,884 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local1070093788_0006 2017-03-22 11:16:47,476 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar <- /home/miracle/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:47,534 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp643027494/mongo-java-driver-3.0.4.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:47,534 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar <- /home/miracle/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:47,674 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp-1745369112/mongo-hadoop-core-2.0.2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:48,194 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar <- /home/miracle/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:48,201 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp116725398/mongo-hadoop-pig-2.0.2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:48,329 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar <- /home/miracle/pig-0.16.0-core-h2.jar 2017-03-22 11:16:48,337 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp499355324/pig-0.16.0-core-h2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar 2017-03-22 11:16:48,338 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar <- /home/miracle/automaton-1.11-8.jar 2017-03-22 11:16:48,370 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp413788756/automaton-1.11-8.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar 2017-03-22 11:16:48,371 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar <- /home/miracle/antlr-runtime-3.4.jar 2017-03-22 11:16:48,384 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar 2017-03-22 11:16:48,389 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar <- /home/miracle/joda-time-2.9.3.jar 2017-03-22 11:16:48,409 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp1163422388/joda-time-2.9.3.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar 2017-03-22 11:16:48,798 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:48,803 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:48,803 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:48,804 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar 2017-03-22 11:16:48,806 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar 2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar 2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar 2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/ 2017-03-22 11:16:48,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1070093788_0006 2017-03-22 11:16:48,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases student1 2017-03-22 11:16:48,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: student1[7,11] C: R: 2017-03-22 11:16:48,889 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null 2017-03-22 11:16:48,915 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2017-03-22 11:16:48,915 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local1070093788_0006] 2017-03-22 11:16:48,999 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,011 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2017-03-22 11:16:49,013 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2017-03-22 11:16:49,013 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:49,054 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter 2017-03-22 11:16:49,094 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,104 [Thread-455] INFO com.mongodb.hadoop.output.MongoOutputCommitter - Setting up job. 2017-03-22 11:16:49,126 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks 2017-03-22 11:16:49,127 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1070093788_0006_m_000000_0 2017-03-22 11:16:49,253 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,279 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,290 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.output.MongoOutputCommitter - Setting up task. 2017-03-22 11:16:49,296 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ] 2017-03-22 11:16:49,340 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1 Total Length = 212 Input split[0]: Length = 212 ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit Locations: ----------------------- 2017-03-22 11:16:49,415 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 2017-03-22 11:16:49,417 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed hdfs://localhost:9000/input/student_dir/student_Info.txt:0+212 2017-03-22 11:16:49,459 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,684 [LocalJobRunner Map Task Executor #0] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500} 2017-03-22 11:16:50,484 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.output.MongoRecordWriter - Writing to temporary file: /tmp/hadoop-miracle/attempt_local1070093788_0006_m_000000_0/_MONGO_OUT_TEMP/_out 2017-03-22 11:16:50,516 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Preparing to write to com.mongodb.hadoop.output.MongoRecordWriter@1fd6ae6 2017-03-22 11:16:50,736 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (Tenured Gen) of size 699072512 to monitor. collectionUsageThreshold = 489350752, usageThreshold = 489350752 2017-03-22 11:16:50,739 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2017-03-22 11:16:50,746 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: student1[7,11] C: R: 2017-03-22 11:16:50,880 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete. 2017-03-22 11:16:50,919 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:50,963 [Thread-455] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1070093788_0006 java.lang.Exception: java.io.IOException: java.io.IOException: Couldn't convert tuple to bson: at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.io.IOException: java.io.IOException: Couldn't convert tuple to bson: at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:83) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:144) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:261) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Couldn't convert tuple to bson: at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:165) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75) ... 17 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:117) at com.mongodb.hadoop.pig.JSONPigReplace.substitute(JSONPigReplace.java:120) at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:142) ... 18 more 2017-03-22 11:16:53,944 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2017-03-22 11:16:53,944 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1070093788_0006 has failed! Stop running all dependent jobs 2017-03-22 11:16:53,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2017-03-22 11:16:53,949 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:53,954 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:53,962 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2017-03-22 11:16:53,981 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.7.3 0.16.0 miracle 2017-03-22 11:16:43 2017-03-22 11:16:53 UNKNOWN Failed! Failed Jobs: JobId Alias Feature Message Outputs job_local1070093788_0006 student1 MAP_ONLY Message: Job failed! mongodb://localhost:27017/pavan.persons_info1, Input(s): Failed to read data from "hdfs://localhost:9000/input/student_dir/student_Info.txt" Output(s): Failed to produce result in "mongodb://localhost:27017/pavan.persons_info1" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_local1070093788_0006 2017-03-22 11:16:53,983 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2017-03-22 11:16:54,004 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias student1 Details at logfile: /home/miracle/pig_1490205716326.log 

Here is the input that I have dumped here, 这是我在这里输入的输入,

Input(s):

Successfully read 6 records (5378419 bytes) from: "hdfs://localhost:9000/input/student_dir/student_Info.txt" 成功读取6条记录(5378419字节):“hdfs:// localhost:9000 / input / student_dir / student_Info.txt”

Output(s): Successfully stored 6 records (5378449 bytes) in: "hdfs://localhost:9000/tmp/temp-1419179625/tmp882976412" 输出:成功存储6条记录(5378449字节):“hdfs:// localhost:9000 / tmp / temp-1419179625 / tmp882976412”

Counters: Total records written : 6 Total bytes written : 5378449 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 计数器:写入的总记录数:6写入的总字节数:5378449溢出内存管理器溢出数:0主要溢出的行李总数:0主动溢出的总记录数:0

Job DAG: job_local1866034015_0001 工作DAG:job_local1866034015_0001

2017-03-23 02:43:37,677 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-23 02:43:37,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-23 02:43:37,689 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-23 02:43:37,736 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! 2017-03-23 02:43:37,677 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - 无法使用processName = JobTracker初始化JVM指标,sessionId = - 已初始化2017-03-23 02:43:37,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - 无法使用processName = JobTracker初始化JVM指标,sessionId = - 已初始化2017-03-23 02:43:37,689 [main] INFO org.apache.hadoop。 metrics.jvm.JvmMetrics - 无法使用processName = JobTracker初始化JVM指标,sessionId = - 已初始化2017-03-23 02:43:37,736 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 成功! 2017-03-23 02:43:37,748 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. 2017-03-23 02:43:37,748 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - 不推荐使用fs.default.name。 Instead, use fs.defaultFS 2017-03-23 02:43:37,751 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2017-03-23 02:43:37,793 [main] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2017-03-23 02:43:37,793 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 (Rajiv,Reddy,9848022337,Hyderabad) (siddarth,Battacharya,9848022338,Kolkata) (Rajesh,Khanna,9848022339,Delhi) (Preethi,Agarwal,9848022330,Pune) (Trupthi,Mohanthy,9848022336,Bhuwaneshwar) (Archana,Mishra,9848022335,Chennai.) 相反,请使用fs.defaultFS 2017-03-23 02:43:37,751 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend已经初始化2017-03-23 02:43:37,793 [main] INFO org .apache.hadoop.mapreduce.lib.input.FileInputFormat - 要处理的总输入路径:1 2017-03-23 02:43:37,793 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - 要处理的总输入路径:1 (Rajiv,Reddy,9848022337,Hyderabad)(siddarth,Battacharya,9848022338,Kolkata)(Rajesh,Khanna,9848022339,Delhi)(Preethi,Agarwal,9848022330,Pune)(Trupthi,Mohanthy,9848022336, Bhuwaneshwar)(Archana,Mishra,9848022335,Chennai。)

I don't know What to do next please give me any suggestions on this. 我不知道下一步该怎么办请给我任何建议。

I do not use Mongo , but from HBase experience and From the error it looks like few fields are not flatten 'ed enough for it to fit the column family. 我不使用Mongo,但是从HBase的经验来看,从错误来看,似乎很少有字段不足以使其适合列系列。 And also few of columns you are trying to STORE don't exist. 并且您尝试存储的列中很少有不存在的列。 See to DUMP the data instead of STORE and check if it is matching the STORE structure you have applied. 请参阅DUMP数据而不是STORE,并检查它是否与您应用的STORE结构相匹配。

Caused by: java.io.IOException: Couldn't convert tuple to bson: at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:165) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75) ... 17 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:117) at com.mongodb.hadoop.pig.JSONPigReplace.substitute(JSONPigReplace.java:120) at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:142) ... 18 more 引起:java.io.IOException:无法将元组转换为bson:at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:165)at org.apache.pig.backend.hadoop.executionengine.physicalLayer .relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75)... 17更多引起:java.lang.IndexOutOfBoundsException:索引:1,大小:1,java.util.ArrayList.rangeCheck(ArrayList.java:653)at java.util.ArrayList.get(ArrayList.java:429)atg.apache.pig.data.DefaultTuple.get(DefaultTuple.java:117)at com.mongodb.hadoop.pig.JSONPigReplace.substitute(JSONPigReplace.java: 120)at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:142)... 18更多

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM