简体   繁体   中英

Loading Data from Hive to Pig Error while Dumping DataSet

retail_db.categories is having 58 rows

$pig -useHCatalog
grunt> pcategories = LOAD 'retail_db.categories' USING org.apache.hive.hcatalog.pig.HCatLoader();
grunt>b = limit pcategories 100;
grunt>dump b;

Then I am getting all the records But when I am trying to dump original dataset

grunt>dump pcategories;

Then I am getting Error

2018-04-15 16:27:46,444 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:46,723 [main] INFO org.apache.hadoop.hive.metastore.ObjectStore - ObjectStore, initialize called 2018-04-15 16:27:47,170 [main] INFO org.apache.hadoop.hive.metastore.MetaStoreDirectSql - Using direct SQL, underlying DB is MYSQL 2018-04-15 16:27:47,171 [main] INFO org.apache.hadoop.hive.metastore.ObjectStore - Initialized ObjectStore 2018-04-15 16:27:47,171 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,171 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,184 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=retail_db tbl=categories 2018-04-15 16:27:47,184 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_table : db=retail_db tbl=ca tegories 2018-04-15 16:27:47,219 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,244 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,244 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,247 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=retail_db tbl=departments 2018-04-15 16:27:47,247 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_table : db=retail_db tbl=departments 2018-04-15 16:27:47,261 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,284 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,284 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,286 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=retail_db tbl=categories 2018-04-15 16:27:47,286 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_table : db=retail_db tbl=categories 2018-04-15 16:27:47,386 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,388 [main] INFO org.apache.pig.tools.pigstats.ScriptState - Pig features used in the script: UNKNOWN 2018-04-15 16:27:47,397 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,397 [main] WARN org.apache.pig.data.SchemaTupleBackend - SchemaTupleBackend has already been initialized 2018-04-15 16:27:47,397 [main] INFO org.apache.pig.newplan.logical.optimizer.LogicalPlanOptimizer - {RULES_ENABLED=[AddForEach, ColumnMapKeyPrune, ConstantCalculator, GroupByConstParallelSetter, LimitOptimizer, LoadTypeCastInserter, MergeFilter, MergeForEach, NestedLimitOptimizer, PartitionFilterOptimizer, PredicatePushdownOptimizer, PushDownForEachFlatten, PushUpFilter, SplitFilter, StreamTypeCastInserter]} 2018-04-15 16:27:47,398 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MRCompiler - File concatenation threshold: 100 optimistic? false 2018-04-15 16:27:47,399 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2018-04-15 16:27:47,399 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 2018-04-15 16:27:47,406 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,407 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2018-04-15 16:27:47,409 [main] INFO org.apache.pig.tools.pigstats.mapreduce.MRScriptState - Pig script settings are added to the job 2018-04-15 16:27:47,409 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - mapred.job.reduce.markreset.buffer.percent is not set, set to default 0.3 2018-04-15 16:27:47,435 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,435 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_databases: NonExistentDatabaseUsedForHealthCheck 2018-04-15 16:27:47,437 [main] INFO org.apache.hadoop.hive.metastore.HiveMetaStore - 0: get_table : db=retail_db tbl=categories 2018-04-15 16:27:47,437 [main] INFO org.apache.hadoop.hiv e.metastore.HiveMetaStore.audit - ugi=jay ip=unknown-ip-addr cmd=get_table : db=retail_db tbl=categories 2018-04-15 16:27:47,458 [main] INFO org.apache.hadoop.conf.Configuration.deprecation - yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled 2018-04-15 16:27:47,458 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - This job cannot be converted run in-process 2018-04-15 16:27:48,419 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/hive-metastore-2.3.2.jar to DistributedCache through /tmp/temp-1113251818/tmp122824794/hive-metastore-2.3.2.jar 2018-04-15 16:27:48,608 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/libthrift-0.9.3.jar to DistributedCache through /tmp/temp-1113251818/tmp1608619006/libthrift-0.9.3.jar 2018-04-15 16:27:49,708 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/hive-exec-2.3.2.jar to DistributedCache through /tmp/temp-1113 251818/tmp1023486409/hive-exec-2.3.2.jar 2018-04-15 16:27:50,352 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/libfb303-0.9.3.jar to DistributedCache through /tmp/temp-1113251818/tmp-207303388/libfb303-0.9.3.jar 2018-04-15 16:27:51,375 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/jdo-api-3.0.1.jar to DistributedCache through /tmp/temp-1113251818/tmp120570913/jdo-api-3.0.1.jar 2018-04-15 16:27:51,497 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/slf4j-api-1.7.25.jar to DistributedCache through /tmp/temp-1113251818/tmp1251741235/slf4j-api-1.7.25.jar 2018-04-15 16:27:51,786 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/a pache-hive-2.3.2-bin/lib/hive-hbase-handler-2.3.2.jar to DistributedCache through /tmp/temp-1113251818/tmp1351750668/hive-hbase-handler-2.3.2.jar 2018-04-15 16:27:52,653 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.17.0/pig-0.17.0-core-h2.jar to DistributedCache through /tmp/temp-1113251818/tmp1548980484/pig-0.17.0-core-h2.jar 2018-04-15 16:27:53,042 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/hcatalog/share/hcatalog/hive-hcatalog-pig-adapter-2.3.2.jar to DistributedCache through /tmp/temp-1113251818/tmp-2078279932/hive-hcatalog-pig-adapter-2.3.2.jar 2018-04-15 16:27:53,197 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/pig-0.17.0/lib/automaton-1.11-8.jar to DistributedCache through /tmp/temp-1113251818/tmp1231439146/automaton-1.11-8.jar 2018-04-15 16:27:53,875 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Added jar file:/usr/local/apache-hive-2.3.2-bin/lib/antlr-runtime-3.5.2.jar to DistributedCache through /tmp/temp-1113251818/tmp970518288/antlr-runtime-3.5.2.jar 2018-04-15 16:27:53,900 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler - Setting up single store job 2018-04-15 16:27:53,920 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 1 map-reduce job(s) waiting for submission. 2018-04-15 16:27:53,922 [JobControl] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2018-04-15 16:27:54,152 [JobControl] INFO org.apache.hadoop.mapreduce.JobResourceUploader - Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/jay/.staging/job_1523787662857_0004 2018-04-15 16:27:54,197 [JobControl] WARN org.apache.hadoop.mapreduce.JobResourceUploader - No job jar file set. User classes may not be found. See Job or Job#setJar(String). 2018-04-15 16:27:54,232 [JobControl] INFO org.apache.hadoop.mapred.FileInputFormat - Total input files to process : 1 2018-04-15 16:27:54,232 [JobControl] INFO org.apache.pig.backend.hadoop.executionengine.util.MapRedUtil - Total input paths (combined) to process : 1 2018-04-15 16:27:54,631 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - number of splits:1 2018-04-15 16:27:55,247 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Submitting tokens for job: job_1523787662857_0004 2018-04-15 16:27:55,247 [JobControl] INFO org.apache.hadoop.mapreduce.JobSubmitter - Executing with tokens: [] 2018-04-15 16:27:55,253 [JobControl] INFO org.apache.hadoop.mapred.YARNRunner - Job jar is not present. Not adding any jar to the list of resources. 2018-04-15 16:27:55,503 [JobControl] INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Submitted application application_1523787662857_0004 2018-04-15 16:27:55,733 [JobControl] INFO org.apache.hadoop.mapreduce.Job - The url to track the job: http://jay-Lenovo-Z50-70:8088/proxy/application_1523787662857_0004/ 2018-04-15 16:27:55,733 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - HadoopJobId: job_1523787662857_0004 2018-04-15 16:27:55,733 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Processing aliases pcategories 2018-04-15 16:27:55,733 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - detailed locations: M: pcategories[3,14] C: R: 2018-04-15 16:27:55,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2018-04-15 16:27:55,877 [main] INFO org.apache.pig.backend.hadoop.executionengine.map ReduceLayer.MapReduceLauncher - Running jobs are [job_1523787662857_0004] 2018-04-15 16:28:27,422 [main] WARN org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Ooops! Some job has failed! Specify -stop_on_failure if you want Pig to stop immediately on failure. 2018-04-15 16:28:27,422 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - job job_1523787662857_0004 has failed! Stop running all dependent jobs 2018-04-15 16:28:27,422 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2018-04-15 16:28:27,424 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2018-04-15 16:28:27,580 [main] INFO org.apache.hadoop.yarn.client.RMProxy - Connecting to ResourceManager at /0.0.0.0:8032 2018-04-15 16:28:27,827 [main] ERROR org.apache.pig.tools.pigstats.mapreduce.MRPigStatsUtil - 1 map reduce job(s) failed! 2018-04-15 16:28:27,827 [main] INFO org.apache.pig.tools.pigstats.mapreduce.SimplePigStats - Script Statistics:

HadoopVersion PigVersion UserId StartedAt FinishedAt Features 3.0.0 0.17.0 jay 2018-04-15 16:27:47 2018-04-15 16:28:27 UNKNOWN

Failed!

Failed Jobs: JobId Alias Feature Message Outputs job_1523787662857_0004 pcategories MAP_ONLY Message: Job failed! hdfs://localhost:9000/tmp/temp-1113251818/tmp-83503168,

Input(s): Failed to read data from "retail_db.categories"

Output(s): Failed to produce result in "hdfs://localhost:9000/tmp/temp-1113251818/tmp-83503168"

Counters: Total records written : 0 Total bytes written : 0 Spillable Memory Manager spill count : 0 Total bags proactively spilled: 0 Total records proactively spilled: 0

Job DAG: job_1523787662857_0004

2018-04-15 16:28:27,828 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Failed! 2018-04-15 16:28:27,836 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 1066: Unable to open iterator for alias pcategories Details at logfile: /home/jay/pig_1523787729987.log

AM Container for appattempt_1523799060075_0001_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2018-04-15 19:02:58.344]Exception from container-launch.
Container id: container_1523799060075_0001_02_000001
Exit code: 1
[2018-04-15 19:02:58.348]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2018-04-15 19:02:58.348]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
For more detailed output, check the application tracking page: http://jay-Lenovo-Z50-70:8088/cluster/app/application_1523799060075_0001 Then click on links to logs of each attempt.  this what get after clicking the link

It worked fine for me. I ran below commands

$pig -useHCatalog
grunt> pcategories = LOAD 'hive_testing.address' USINGorg.apache.hive.hcatalog.pig.HCatLoader();
grunt>dump pcategories

Here i have created a dummy address table in my database

Output

(101,india,xxx)

So the issue could be with your dataset and not with the commands you are running.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM