簡體   English   中英

使用Pig將HDFS數據存儲到MongoDB中

[英]Storing HDFS data into MongoDB using Pig

我是Hadoop的新手,需要將Hadoop數據存儲到MongoDB中。 在這里,我使用Pig將Hadoop中的數據存儲到MongoDB中。

我已經下載並注冊了以下驅動程序,以便在給定命令的幫助下在Pig Grunt shell中執行此操作,

REGISTER /home/miracle/Downloads/mongo-hadoop-pig-2.0.2.jar
REGISTER /home/miracle/Downloads/mongo-java-driver-3.4.2.jar
REGISTER /home/miracle/Downloads/mongo-hadoop-core-2.0.2.jar

在此之后,我使用以下命令成功從MongoDB獲取數據。

raw = LOAD 'mongodb://localhost:27017/pavan.pavan.in' USING com.mongodb.hadoop.pig.MongoLoader;

然后我嘗試使用以下命令將pig bag中的數據插入MongoDB並成功完成。

STORE student INTO 'mongodb://localhost:27017/pavan.persons_info' USING com.mongodb.hadoop.pig.MongoInsertStorage('','');

然后我使用下面的命令嘗試Mongo Update。

STORE student INTO 'mongodb://localhost:27017/pavan.persons_info1' USING com.mongodb.hadoop.pig.MongoUpdateStorage(' ','{first:"\$firstname", last:"\$lastname", phone:"\$phone", city:"\$city"}','firstname: chararray,lastname: chararray,phone: chararray,city: chararray');

但是在執行上述命令時,我收到以下錯誤。

 2017-03-22 11:16:42,516 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:43,064 [main] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:43,180 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2017-03-22 11:16:43,306 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:43,308 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2017-03-22 11:16:43,309 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2017-03-22 11:16:43,310 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2017-03-22 11:16:43,314 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2017-03-22 11:16:43,314 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2017-03-22 11:16:43,415 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:43,419 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:43,423 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2017-03-22 11:16:43,425 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2017-03-22 11:16:43,438 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2017-03-22 11:16:43,603 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-java-driver-3.0.4.jar to DistributedCache through /tmp/temp159471787/tmp643027494/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:43,687 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-hadoop-core-2.0.2.jar to DistributedCache through /tmp/temp159471787/tmp-1745369112/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:43,822 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/home/miracle/Downloads/mongo-hadoop-pig-2.0.2.jar to DistributedCache through /tmp/temp159471787/tmp116725398/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:44,693 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/pig-0.16.0-core-h2.jar to DistributedCache through /tmp/temp159471787/tmp499355324/pig-0.16.0-core-h2.jar 2017-03-22 11:16:44,762 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp159471787/tmp413788756/automaton-1.11-8.jar 2017-03-22 11:16:44,830 [DataStreamer for file /tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar block BP-1303579226-127.0.1.1-1489750707340:blk_1073742392_1568] WARN org.apache.hadoop.hdfs.DFSClient - Caught exception java.lang.InterruptedException at java.lang.Object.wait(Native Method) at java.lang.Thread.join(Thread.java:1249) at java.lang.Thread.join(Thread.java:1323) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeResponder(DFSOutputStream.java:609) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.closeInternal(DFSOutputStream.java:577) at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:573) 2017-03-22 11:16:44,856 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/antlr-runtime-3.4.jar to DistributedCache through /tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar 2017-03-22 11:16:44,960 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig/pig-0.16.0/lib/joda-time-2.9.3.jar to DistributedCache through /tmp/temp159471787/tmp1163422388/joda-time-2.9.3.jar 2017-03-22 11:16:44,996 [main] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:45,004 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2017-03-22 11:16:45,147 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2017-03-22 11:16:45,166 [JobControl] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:45,253 [JobControl] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:45,318 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2017-03-22 11:16:45,572 [JobControl] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 2017-03-22 11:16:45,579 [JobControl] INFO org.apache.hadoop.mapreduce.lib.input.FileInputFormat - Total input paths to process : 1 2017-03-22 11:16:45,581 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths to process : 1 2017-03-22 11:16:45,593 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2017-03-22 11:16:45,690 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2017-03-22 11:16:45,884 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_local1070093788_0006 2017-03-22 11:16:47,476 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar <- /home/miracle/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:47,534 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp643027494/mongo-java-driver-3.0.4.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:47,534 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar <- /home/miracle/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:47,674 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp-1745369112/mongo-hadoop-core-2.0.2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:48,194 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar <- /home/miracle/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:48,201 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp116725398/mongo-hadoop-pig-2.0.2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:48,329 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar <- /home/miracle/pig-0.16.0-core-h2.jar 2017-03-22 11:16:48,337 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp499355324/pig-0.16.0-core-h2.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar 2017-03-22 11:16:48,338 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar <- /home/miracle/automaton-1.11-8.jar 2017-03-22 11:16:48,370 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp413788756/automaton-1.11-8.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar 2017-03-22 11:16:48,371 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar <- /home/miracle/antlr-runtime-3.4.jar 2017-03-22 11:16:48,384 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp-380031198/antlr-runtime-3.4.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar 2017-03-22 11:16:48,389 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Creating symlink: /tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar <- /home/miracle/joda-time-2.9.3.jar 2017-03-22 11:16:48,409 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - Localized hdfs://localhost:9000/tmp/temp159471787/tmp1163422388/joda-time-2.9.3.jar as file:/tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar 2017-03-22 11:16:48,798 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606120/mongo-java-driver-3.0.4.jar 2017-03-22 11:16:48,803 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606121/mongo-hadoop-core-2.0.2.jar 2017-03-22 11:16:48,803 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606122/mongo-hadoop-pig-2.0.2.jar 2017-03-22 11:16:48,804 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606123/pig-0.16.0-core-h2.jar 2017-03-22 11:16:48,806 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606124/automaton-1.11-8.jar 2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606125/antlr-runtime-3.4.jar 2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapred.LocalDistributedCacheManager - file:/tmp/hadoop-miracle/mapred/local/1490206606126/joda-time-2.9.3.jar 2017-03-22 11:16:48,807 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://localhost:8080/ 2017-03-22 11:16:48,809 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_local1070093788_0006 2017-03-22 11:16:48,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases student1 2017-03-22 11:16:48,812 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: student1[7,11] C: R: 2017-03-22 11:16:48,889 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter set in config null 2017-03-22 11:16:48,915 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2017-03-22 11:16:48,915 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_local1070093788_0006] 2017-03-22 11:16:48,999 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: ; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,011 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address 2017-03-22 11:16:49,013 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - mapred.job.reduce.markreset.buffer.percent is deprecated. Instead, use mapreduce.reduce.markreset.buffer.percent 2017-03-22 11:16:49,013 [Thread-455] INFO org.apache.hadoop.conf.Configuration.deprecation - fs.default.name is deprecated. Instead, use fs.defaultFS 2017-03-22 11:16:49,054 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - OutputCommitter is org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputCommitter 2017-03-22 11:16:49,094 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,104 [Thread-455] INFO com.mongodb.hadoop.output.MongoOutputCommitter - Setting up job. 2017-03-22 11:16:49,126 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - Waiting for map tasks 2017-03-22 11:16:49,127 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.LocalJobRunner - Starting task: attempt_local1070093788_0006_m_000000_0 2017-03-22 11:16:49,253 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,279 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,290 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.output.MongoOutputCommitter - Setting up task. 2017-03-22 11:16:49,296 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.Task - Using ResourceCalculatorProcessTree : [ ] 2017-03-22 11:16:49,340 [LocalJobRunner Map Task Executor #0] INFO org.apache.hadoop.mapred.MapTask - Processing split: Number of splits :1 Total Length = 212 Input split[0]: Length = 212 ClassName: org.apache.hadoop.mapreduce.lib.input.FileSplit Locations: ----------------------- 2017-03-22 11:16:49,415 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.builtin.PigStorage - Using PigTextInputFormat 2017-03-22 11:16:49,417 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigRecordReader - Current split being processed hdfs://localhost:9000/input/student_dir/student_Info.txt:0+212 2017-03-22 11:16:49,459 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:49,684 [LocalJobRunner Map Task Executor #0] INFO org.mongodb.driver.cluster - Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=500} 2017-03-22 11:16:50,484 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.output.MongoRecordWriter - Writing to temporary file: /tmp/hadoop-miracle/attempt_local1070093788_0006_m_000000_0/_MONGO_OUT_TEMP/_out 2017-03-22 11:16:50,516 [LocalJobRunner Map Task Executor #0] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Preparing to write to com.mongodb.hadoop.output.MongoRecordWriter@1fd6ae6 2017-03-22 11:16:50,736 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.impl.util.SpillableMemoryManager - Selected heap (Tenured Gen) of size 699072512 to monitor. collectionUsageThreshold = 489350752, usageThreshold = 489350752 2017-03-22 11:16:50,739 [LocalJobRunner Map Task Executor #0] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2017-03-22 11:16:50,746 [LocalJobRunner Map Task Executor #0] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: student1[7,11] C: R: 2017-03-22 11:16:50,880 [Thread-455] INFO org.apache.hadoop.mapred.LocalJobRunner - map task executor complete. 2017-03-22 11:16:50,919 [Thread-455] INFO com.mongodb.hadoop.pig.MongoUpdateStorage - Store location config: Configuration: core-default.xml, core-site.xml, mapred-default.xml, mapred-site.xml, yarn-default.xml, yarn-site.xml, hdfs-default.xml, hdfs-site.xml, file:/tmp/hadoop-miracle/mapred/local/localRunner/miracle/job_local1070093788_0006/job_local1070093788_0006.xml; for namespace: pavan.persons_info1; hosts: [localhost:27017] 2017-03-22 11:16:50,963 [Thread-455] WARN org.apache.hadoop.mapred.LocalJobRunner - job_local1070093788_0006 java.lang.Exception: java.io.IOException: java.io.IOException: Couldn't convert tuple to bson: at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) Caused by: java.io.IOException: java.io.IOException: Couldn't convert tuple to bson: at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:83) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:144) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigOutputFormat$PigRecordWriter.write(PigOutputFormat.java:97) at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:658) at org.apache.hadoop.mapreduce.task.TaskInputOutputContextImpl.write(TaskInputOutputContextImpl.java:89) at org.apache.hadoop.mapreduce.lib.map.WrappedMapper$Context.write(WrappedMapper.java:112) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map.collect(PigMapOnly.java:48) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:261) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:65) at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:146) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:787) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Couldn't convert tuple to bson: at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:165) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75) ... 17 more Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 at java.util.ArrayList.rangeCheck(ArrayList.java:653) at java.util.ArrayList.get(ArrayList.java:429) at org.apache.pig.data.DefaultTuple.get(DefaultTuple.java:117) at com.mongodb.hadoop.pig.JSONPigReplace.substitute(JSONPigReplace.java:120) at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:142) ... 18 more 2017-03-22 11:16:53,944 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2017-03-22 11:16:53,944 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_local1070093788_0006 has failed! Stop running all dependent jobs 2017-03-22 11:16:53,945 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2017-03-22 11:16:53,949 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:53,954 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2017-03-22 11:16:53,962 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2017-03-22 11:16:53,981 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.7.3 0.16.0 miracle 2017-03-22 11:16:43 2017-03-22 11:16:53 UNKNOWN Failed! Failed Jobs: JobId Alias Feature Message Outputs job_local1070093788_0006 student1 MAP_ONLY Message: Job failed! mongodb://localhost:27017/pavan.persons_info1, Input(s): Failed to read data from "hdfs://localhost:9000/input/student_dir/student_Info.txt" Output(s): Failed to produce result in "mongodb://localhost:27017/pavan.persons_info1" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_local1070093788_0006 2017-03-22 11:16:53,983 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2017-03-22 11:16:54,004 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1002: Unable to store alias student1 Details at logfile: /home/miracle/pig_1490205716326.log 

這是我在這里輸入的輸入,

Input(s):

成功讀取6條記錄(5378419字節):“hdfs:// localhost:9000 / input / student_dir / student_Info.txt”

輸出:成功存儲6條記錄(5378449字節):“hdfs:// localhost:9000 / tmp / temp-1419179625 / tmp882976412”

計數器:寫入的總記錄數:6寫入的總字節數:5378449溢出內存管理器溢出數:0主要溢出的行李總數:0主動溢出的總記錄數:0

工作DAG:job_local1866034015_0001

2017-03-23 02:43:37,677 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - 無法使用processName = JobTracker初始化JVM指標,sessionId = - 已初始化2017-03-23 02:43:37,681 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - 無法使用processName = JobTracker初始化JVM指標,sessionId = - 已初始化2017-03-23 02:43:37,689 [main] INFO org.apache.hadoop。 metrics.jvm.JvmMetrics - 無法使用processName = JobTracker初始化JVM指標,sessionId = - 已初始化2017-03-23 02:43:37,736 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 成功! 2017-03-23 02:43:37,748 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - 不推薦使用fs.default.name。 相反,請使用fs.defaultFS 2017-03-23 02:43:37,751 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend已經初始化2017-03-23 02:43:37,793 [main] INFO org .apache.hadoop.mapreduce.lib.input.FileInputFormat - 要處理的總輸入路徑:1 2017-03-23 02:43:37,793 [main] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - 要處理的總輸入路徑:1 (Rajiv,Reddy,9848022337,Hyderabad)(siddarth,Battacharya,9848022338,Kolkata)(Rajesh,Khanna,9848022339,Delhi)(Preethi,Agarwal,9848022330,Pune)(Trupthi,Mohanthy,9848022336, Bhuwaneshwar)(Archana,Mishra,9848022335,Chennai。)

我不知道下一步該怎么辦請給我任何建議。

我不使用Mongo,但是從HBase的經驗來看,從錯誤來看,似乎很少有字段不足以使其適合列系列。 並且您嘗試存儲的列中很少有不存在的列。 請參閱DUMP數據而不是STORE,並檢查它是否與您應用的STORE結構相匹配。

引起:java.io.IOException:無法將元組轉換為bson:at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:165)at org.apache.pig.backend.hadoop.executionengine.physicalLayer .relationalOperators.StoreFuncDecorator.putNext(StoreFuncDecorator.java:75)... 17更多引起:java.lang.IndexOutOfBoundsException:索引:1,大小:1,java.util.ArrayList.rangeCheck(ArrayList.java:653)at java.util.ArrayList.get(ArrayList.java:429)atg.apache.pig.data.DefaultTuple.get(DefaultTuple.java:117)at com.mongodb.hadoop.pig.JSONPigReplace.substitute(JSONPigReplace.java: 120)at com.mongodb.hadoop.pig.MongoUpdateStorage.putNext(MongoUpdateStorage.java:142)... 18更多

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM