简体   繁体   中英

Hive archive partition(dynamic) FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

I'm trying to archive some old data from my table. Using ALTER TABLE TABLE_NAME ARCHIVE PARTITION(part_col) query.

Hadoop version - 2.7.3
Hive version - 1.2.1

Table structure is as follows,

hive> desc clicks_fact;
OK
time                    timestamp                                   
user_id                 varchar(32)                                 
advertiser_id           int                                         
buy_id                  int                                         
ad_id                   int                                         
creative_id             int                                         
creative_version        smallint                                    
creative_size           varchar(10)                                 
site_id                 int                                         
page_id                 int                                         
keyword                 varchar(48)                                 
country_id              varchar(10)                                 
state                   varchar(10)                                 
area_code               int                                         
browser_id              smallint                                    
browser_version         varchar(10)                                 
os_id                   int                                         
zip                     varchar(10)                                 
site_data               varchar(20)                                 
sv1                     varchar(10)                                 
day                     date                                        
file_date               varchar(8)                                  

# Partition Information      
# col_name              data_type               comment             

day                     date                                        
file_date               varchar(8)                                  
Time taken: 0.112 seconds, Fetched: 28 row(s)

Now, I'm trying to archive the data for a specific partition like the following,

hive> ALTER TABLE clicks_fact ARCHIVE partition(day='2017-06-30', file_date='20170629');
intermediate.archived is hdfs://localhost:54310/user/hive/warehouse/scheme.db/clicks_fact/day=2017-06-30/file_date=20170629_INTERMEDIATE_ARCHIVED
intermediate.original is hdfs://localhost:54310/user/hive/warehouse/scheme.db/clicks_fact/day=2017-06-30/file_date=20170629_INTERMEDIATE_ORIGINAL
Creating data.har for hdfs://localhost:54310/user/hive/warehouse/scheme.db/clicks_fact/day=2017-06-30/file_date=20170629
in hdfs://localhost:54310/user/hive/warehouse/scheme.db/clicks_fact/day=2017-06-30/file_date=20170629/.hive-staging_hive_2017-10-12_22-03-17_129_6395228918576649008-1/-ext-10000/partlevel
Please wait... (this may take a while)
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org/apache/hadoop/tools/HadoopArchives

I'm able to create a HAR in Hadoop directly, using,

$ hadoop archive -archiveName archive.har -p /mydir_* /

So, its not an dependency issue inside Hadoop.

Any help would be greatly appreciated.

Logs:

2017-10-23 22:26:39,210 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,211 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,211 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,213 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,213 INFO  [main]: parse.ParseDriver (ParseDriver.java:parse(185)) - Parsing command: alter table clicks_fact archive partition(day='2017-06-30', file_date='20170629')
2017-10-23 22:26:39,223 INFO  [main]: parse.ParseDriver (ParseDriver.java:parse(209)) - Parse Completed
2017-10-23 22:26:39,224 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=parse start=1508777799213 end=1508777799224 duration=11 from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,225 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,234 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(746)) - 0: get_table : db=scheme tbl=clicks_fact
2017-10-23 22:26:39,235 INFO  [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(371)) - ugi=sridhar ip=unknown-ip-addr  cmd=get_table : db=scheme tbl=clicks_fact   
2017-10-23 22:26:39,410 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(746)) - 0: get_partitions_ps_with_auth : db=scheme tbl=clicks_fact[2017-06-30,20170629]
2017-10-23 22:26:39,410 INFO  [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(371)) - ugi=sridhar ip=unknown-ip-addr  cmd=get_partitions_ps_with_auth : db=scheme tbl=clicks_fact[2017-06-30,20170629]    
2017-10-23 22:26:39,463 INFO  [main]: ql.Driver (Driver.java:compile(436)) - Semantic Analysis Completed
2017-10-23 22:26:39,463 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=semanticAnalyze start=1508777799225 end=1508777799463 duration=238 from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,463 INFO  [main]: ql.Driver (Driver.java:getSchema(240)) - Returning Hive schema: Schema(fieldSchemas:null, properties:null)
2017-10-23 22:26:39,463 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=compile start=1508777799211 end=1508777799463 duration=252 from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,463 INFO  [main]: ql.Driver (Driver.java:checkConcurrency(160)) - Concurrency mode is disabled, not creating a lock manager
2017-10-23 22:26:39,464 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,464 INFO  [main]: ql.Driver (Driver.java:execute(1328)) - Starting command(queryId=sridhar_20171023222639_d1453a90-0340-411c-b131-77d112862acc): alter table clicks_fact archive partition(day='2017-06-30', file_date='20170629')
2017-10-23 22:26:39,465 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=TimeToSubmit start=1508777799211 end=1508777799465 duration=254 from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,465 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,465 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,465 INFO  [main]: ql.Driver (Driver.java:launchTask(1651)) - Starting task [Stage-0:DDL] in serial mode
2017-10-23 22:26:39,465 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(746)) - 0: get_table : db=scheme tbl=clicks_fact
2017-10-23 22:26:39,466 INFO  [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(371)) - ugi=sridhar ip=unknown-ip-addr  cmd=get_table : db=scheme tbl=clicks_fact   
2017-10-23 22:26:39,489 INFO  [main]: metastore.HiveMetaStore (HiveMetaStore.java:logInfo(746)) - 0: get_partitions_ps_with_auth : db=scheme tbl=clicks_fact[2017-06-30,20170629]
2017-10-23 22:26:39,489 INFO  [main]: HiveMetaStore.audit (HiveMetaStore.java:logAuditEvent(371)) - ugi=sridhar ip=unknown-ip-addr  cmd=get_partitions_ps_with_auth : db=scheme tbl=clicks_fact[2017-06-30,20170629]    
2017-10-23 22:26:39,526 INFO  [main]: exec.Task (SessionState.java:printInfo(951)) - intermediate.archived is hdfs://localhost:54310/user/hive/warehouse/scheme.db/clicks_fact/day=2017-06-30/file_date=20170629_INTERMEDIATE_ARCHIVED
2017-10-23 22:26:39,526 INFO  [main]: exec.Task (SessionState.java:printInfo(951)) - intermediate.original is hdfs://localhost:54310/user/hive/warehouse/scheme.db/clicks_fact/day=2017-06-30/file_date=20170629_INTERMEDIATE_ORIGINAL
2017-10-23 22:26:39,542 INFO  [main]: common.FileUtils (FileUtils.java:mkdir(501)) - Creating directory if it doesn't exist: hdfs://localhost:54310/user/hive/warehouse/scheme.db/clicks_fact/day=2017-06-30/file_date=20170629/.hive-staging_hive_2017-10-23_22-26-39_212_2574575409261622278-1
2017-10-23 22:26:39,616 INFO  [main]: exec.Task (SessionState.java:printInfo(951)) - Creating data.har for hdfs://localhost:54310/user/hive/warehouse/scheme.db/clicks_fact/day=2017-06-30/file_date=20170629
2017-10-23 22:26:39,616 INFO  [main]: exec.Task (SessionState.java:printInfo(951)) - in hdfs://localhost:54310/user/hive/warehouse/scheme.db/clicks_fact/day=2017-06-30/file_date=20170629/.hive-staging_hive_2017-10-23_22-26-39_212_2574575409261622278-1/-ext-10000/partlevel
2017-10-23 22:26:39,616 INFO  [main]: exec.Task (SessionState.java:printInfo(951)) - Please wait... (this may take a while)
2017-10-23 22:26:39,645 INFO  [main]: Configuration.deprecation (Configuration.java:warnOnceIfDeprecated(1173)) - session.id is deprecated. Instead, use dfs.metrics.session-id
2017-10-23 22:26:39,646 INFO  [main]: jvm.JvmMetrics (JvmMetrics.java:init(76)) - Initializing JVM Metrics with processName=JobTracker, sessionId=
2017-10-23 22:26:39,656 ERROR [main]: exec.DDLTask (DDLTask.java:failed(520)) - java.lang.NoSuchMethodError: org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(Lorg/apache/hadoop/mapred/JobClient;Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/fs/Path;
    at org.apache.hadoop.tools.HadoopArchives.archive(HadoopArchives.java:476)
    at org.apache.hadoop.tools.HadoopArchives.run(HadoopArchives.java:862)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
    at org.apache.hadoop.hive.ql.exec.DDLTask.archive(DDLTask.java:1359)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:360)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:160)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:88)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1653)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1412)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1195)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1059)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1049)
    at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:213)
    at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:165)
    at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:376)
    at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:736)
    at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:681)
    at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:621)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

2017-10-23 22:26:39,656 ERROR [main]: ql.Driver (SessionState.java:printError(960)) - FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. org.apache.hadoop.mapreduce.JobSubmissionFiles.getStagingDir(Lorg/apache/hadoop/mapred/JobClient;Lorg/apache/hadoop/conf/Configuration;)Lorg/apache/hadoop/fs/Path;
2017-10-23 22:26:39,656 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=Driver.execute start=1508777799464 end=1508777799656 duration=192 from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,656 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,656 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=releaseLocks start=1508777799656 end=1508777799656 duration=0 from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,673 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogBegin(121)) - <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
2017-10-23 22:26:39,673 INFO  [main]: log.PerfLogger (PerfLogger.java:PerfLogEnd(148)) - </PERFLOG method=releaseLocks start=1508777799673 end=1508777799673 duration=0 from=org.apache.hadoop.hive.ql.Driver>

Looks like the dependency was the problem.

I've added hadoop-tools.jar as dependency (inside hive_home/lib ) at first. This is what caused the problem. It got resolved after I added hadoop-archives.jar as dependency instead of hadoop-tools.jar .

Thanks for the help from, @Joby & @Max08

Few points to check

set hive.archive.enabled=true;

set hive.metastore.schema.verification=true;

Hive now records the schema version in the metastore database and verifies that the metastore schema version is compatible with Hive binaries that are going to accesss the metastore. Note that the Hive properties to implicitly create or alter the existing schema are disabled by default . Hive will not attempt to change the metastore schema implicitly. When you execute a Hive query against an old schema, it will fail to access the metastore.

I use hive --auxpath $HADOOP_HOME/share/hadoop/tools/lib/hadoop-archives-2.7.2.jar and it works.

Hive use --auxpath to specify auxiliary jar which will be loaded when creating a new session. Default this jar will not be loaded without --auxpath.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM