简体   繁体   中英

Error processing complex json object of twitter with pig JsonLoader() of elephant-bird Jars

I wanted to process twitter json object with pig using elephant-bird jars for which i wrote the pig script as below.

 REGISTER '/usr/lib/pig/lib/elephant-bird-hadoop-compat-4.1.jar'; REGISTER '/usr/lib/pig/lib/elephant-bird-pig-4.1.jar'; A = LOAD '/user/flume/tweets/data.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') AS myMap; B = FOREACH A GENERATE myMap#'id' AS ID,myMap#'created_at' AS createdAT; DUMP B; 

which gave me error as below

 2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1439883208520_0177 2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases A,B 2015-08-25 11:06:34,295 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: A[3,4],B[4,4] C: R: 2015-08-25 11:06:34,303 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2015-08-25 11:06:34,303 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1439883208520_0177] 2015-08-25 11:07:06,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 50% complete 2015-08-25 11:07:06,449 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Running jobs are [job_1439883208520_0177] 2015-08-25 11:07:09,458 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2015-08-25 11:07:09,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1439883208520_0177 has failed! Stop running all dependent jobs 2015-08-25 11:07:09,459 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2015-08-25 11:07:09,667 [main] INFO org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl - Timeline service address: http://trinityhadoopmaster.com:8188/ws/v1/timeline/ 2015-08-25 11:07:09,668 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at trinityhadoopmaster.com/192.168.1.135:8032 2015-08-25 11:07:09,678 [main] INFO org.apache.hadoop.mapred.ClientServiceDelegate - Application state is completed. FinalApplicationStatus=FAILED. Redirecting to job history server 2015-08-25 11:07:09,779 [main] ERROR org.apache.pig.tools.pigstats.PigStats - ERROR 0: java.lang.ClassNotFoundException: org.json.simple.parser.ParseException 2015-08-25 11:07:09,779 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2015-08-25 11:07:09,780 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics: HadoopVersion PigVersion UserId StartedAt FinishedAt Features 2.6.0 0.14.0 hdfs 2015-08-25 11:06:33 2015-08-25 11:07:09 UNKNOWN Failed! Failed Jobs: JobId Alias Feature Message Outputs job_1439883208520_0177 A,B MAP_ONLY Message: Job failed! hdfs://trinityhadoopmaster.com:9000/tmp/temp1554332510/tmp835744559, Input(s): Failed to read data from "hdfs://trinityhadoopmaster.com:9000/user/flume/tweets/data.json" Output(s): Failed to produce result in "hdfs://trinityhadoopmaster.com:9000/tmp/temp1554332510/tmp835744559" Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0 Job DAG: job_1439883208520_0177 2015-08-25 11:07:09,780 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2015-08-25 11:07:09,787 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias B. Backend error : java.lang.ClassNotFoundException: org.json.simple.parser.ParseException Details at logfile: /tmp/pig-err.log grunt> 

which i have no clue on how to approach, can any one help me on this.

REGISTER '/tmp/elephant-bird-core-4.1.jar';

REGISTER '/tmp/elephant-bird-pig-4.1.jar';

REGISTER '/tmp/elephant-bird-hadoop-compat-4.1.jar';

REGISTER '/tmp/google-collections-1.0.jar';

REGISTER '/tmp/json-simple-1.1.jar';

It works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM