繁体   English   中英

Hadoop作业输入文件的位置

[英]Location of a Hadoop job input file

我正在尝试在Spring批处理作业中生成序列文件,以传递给Hadoop映射/归约。 通过将文件手动复制到hdfs,我设法使该工作能够工作一次。 当它在我的本地系统测试中运行时,它运行良好,因为本地文件系统找到了该文件。 但是,当我尝试将其部署到远程Hadoop实例时,出现以下异常。

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ngram-test:9000/user/hduser/DocumentsPTOgrants2007_2011.seq
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
    at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
    at com.atsid.hadoop.jobs.AbstractJobRunner.executeJob(AbstractJobRunner.java:70)
    at com.atsid.hadoop.jobs.AbstractJobRunner.run(AbstractJobRunner.java:36)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at com.atsid.cloudbase.ngram.ingest.mapreduce.NGramIngestJobRunner.main(NGramIngestJobRunner.java:34)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:191)
    at org.springframework.data.hadoop.mapreduce.JarExecutor.invokeTargetObject(JarExecutor.java:71)
    at org.springframework.data.hadoop.mapreduce.HadoopCodeExecutor.invokeTarget(HadoopCodeExecutor.java:185)
    at org.springframework.data.hadoop.mapreduce.HadoopCodeExecutor.runCode(HadoopCodeExecutor.java:102)
    at org.springframework.data.hadoop.mapreduce.JarTasklet.execute(JarTasklet.java:32)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:132)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:120)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
    at com.sun.proxy.$Proxy43.execute(Unknown Source)
    at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386)
    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:131)
    at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264)
    at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
    at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367)
    at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214)
    at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
    at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250)
    at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
    at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135)
    at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61)
    at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60)
    at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144)
    at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124)
    at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135)
    at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293)
    at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120)
    at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:49)
    at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:114)
    at org.springframework.batch.core.launch.support.CommandLineJobRunner.start(CommandLineJobRunner.java:349)
    at org.springframework.batch.core.launch.support.CommandLineJobRunner.main(CommandLineJobRunner.java:574)

这是该步骤使用的tasklet配置。 我正在尝试使用files属性将输入文件传递给HDFS。 该文件显示在Hadoop日志中。

<hdp:jar-tasklet id="ingestJarTasklet" scope="step"
                 jar="file:${ingest.job.jar.path}"
                 main-class="com.atsid.cloudbase.ngram.ingest.mapreduce.NGramIngestJobRunner"
                 libs="${ingest.job.libs}"
                 files="#{seqFileLocation.URI.toString()}"
                 configuration-ref="hadoopConfiguration">
    ngram.jobrunner.input.document.sequence.file=${job.file.location}#{seqFileLocation.filename}
</hdp:jar-tasklet>

您的输入路径是/user/hduser/DocumentsPTOgrants2007_2011.seq还是输入hdfs://ngram-test:9000/user/hduser/DocumentsPTOgrants2007_2011.seq

如果使用第二个,请尝试第一个,并确保存在DocumentsPTOgrants2007_2011.seq

您可以通过以下命令查看它是否存在: hadoop dfs -ls ./

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM