Hadoop作业输入文件的位置

Question

I'm trying to generate a sequence file in my Spring batch job to be passed to a Hadoop map/reduce. 我正在尝试在Spring批处理作业中生成序列文件，以传递给Hadoop映射/归约。 I managed to get the job to work once by manually copying the file onto the hdfs. 通过将文件手动复制到hdfs，我设法使该工作能够工作一次。 And when it's run in my local system test, it runs fine because the local filesystem finds the file. 当它在我的本地系统测试中运行时，它运行良好，因为本地文件系统找到了该文件。 But when I attempt to deploy it to a remote Hadoop instance, I get the following exception. 但是，当我尝试将其部署到远程Hadoop实例时，出现以下异常。

org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ngram-test:9000/user/hduser/DocumentsPTOgrants2007_2011.seq
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:224)
    at org.apache.hadoop.mapreduce.lib.input.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:55)
    at org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:241)
    at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:885)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:779)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:432)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:447)
    at com.atsid.hadoop.jobs.AbstractJobRunner.executeJob(AbstractJobRunner.java:70)
    at com.atsid.hadoop.jobs.AbstractJobRunner.run(AbstractJobRunner.java:36)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
    at com.atsid.cloudbase.ngram.ingest.mapreduce.NGramIngestJobRunner.main(NGramIngestJobRunner.java:34)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.springframework.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:191)
    at org.springframework.data.hadoop.mapreduce.JarExecutor.invokeTargetObject(JarExecutor.java:71)
    at org.springframework.data.hadoop.mapreduce.HadoopCodeExecutor.invokeTarget(HadoopCodeExecutor.java:185)
    at org.springframework.data.hadoop.mapreduce.HadoopCodeExecutor.runCode(HadoopCodeExecutor.java:102)
    at org.springframework.data.hadoop.mapreduce.JarTasklet.execute(JarTasklet.java:32)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:317)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:183)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:150)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.doProceed(DelegatingIntroductionInterceptor.java:132)
    at org.springframework.aop.support.DelegatingIntroductionInterceptor.invoke(DelegatingIntroductionInterceptor.java:120)
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:172)
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:204)
    at com.sun.proxy.$Proxy43.execute(Unknown Source)
    at org.springframework.batch.core.step.tasklet.TaskletStep$ChunkTransactionCallback.doInTransaction(TaskletStep.java:386)
    at org.springframework.transaction.support.TransactionTemplate.execute(TransactionTemplate.java:131)
    at org.springframework.batch.core.step.tasklet.TaskletStep$2.doInChunkContext(TaskletStep.java:264)
    at org.springframework.batch.core.scope.context.StepContextRepeatCallback.doInIteration(StepContextRepeatCallback.java:76)
    at org.springframework.batch.repeat.support.RepeatTemplate.getNextResult(RepeatTemplate.java:367)
    at org.springframework.batch.repeat.support.RepeatTemplate.executeInternal(RepeatTemplate.java:214)
    at org.springframework.batch.repeat.support.RepeatTemplate.iterate(RepeatTemplate.java:143)
    at org.springframework.batch.core.step.tasklet.TaskletStep.doExecute(TaskletStep.java:250)
    at org.springframework.batch.core.step.AbstractStep.execute(AbstractStep.java:195)
    at org.springframework.batch.core.job.SimpleStepHandler.handleStep(SimpleStepHandler.java:135)
    at org.springframework.batch.core.job.flow.JobFlowExecutor.executeStep(JobFlowExecutor.java:61)
    at org.springframework.batch.core.job.flow.support.state.StepState.handle(StepState.java:60)
    at org.springframework.batch.core.job.flow.support.SimpleFlow.resume(SimpleFlow.java:144)
    at org.springframework.batch.core.job.flow.support.SimpleFlow.start(SimpleFlow.java:124)
    at org.springframework.batch.core.job.flow.FlowJob.doExecute(FlowJob.java:135)
    at org.springframework.batch.core.job.AbstractJob.execute(AbstractJob.java:293)
    at org.springframework.batch.core.launch.support.SimpleJobLauncher$1.run(SimpleJobLauncher.java:120)
    at org.springframework.core.task.SyncTaskExecutor.execute(SyncTaskExecutor.java:49)
    at org.springframework.batch.core.launch.support.SimpleJobLauncher.run(SimpleJobLauncher.java:114)
    at org.springframework.batch.core.launch.support.CommandLineJobRunner.start(CommandLineJobRunner.java:349)
    at org.springframework.batch.core.launch.support.CommandLineJobRunner.main(CommandLineJobRunner.java:574)

Here's the tasklet configuration used by the step. 这是该步骤使用的tasklet配置。 I'm attempting to use the files attribute to pass the input file to the HDFS. 我正在尝试使用files属性将输入文件传递给HDFS。 The file is appearing in the Hadoop logs. 该文件显示在Hadoop日志中。

<hdp:jar-tasklet id="ingestJarTasklet" scope="step"
                 jar="file:${ingest.job.jar.path}"
                 main-class="com.atsid.cloudbase.ngram.ingest.mapreduce.NGramIngestJobRunner"
                 libs="${ingest.job.libs}"
                 files="#{seqFileLocation.URI.toString()}"
                 configuration-ref="hadoopConfiguration">
    ngram.jobrunner.input.document.sequence.file=${job.file.location}#{seqFileLocation.filename}
</hdp:jar-tasklet>

Answer 1

Is your input path /user/hduser/DocumentsPTOgrants2007_2011.seq or you give hdfs://ngram-test:9000/user/hduser/DocumentsPTOgrants2007_2011.seq ? 您的输入路径是/user/hduser/DocumentsPTOgrants2007_2011.seq还是输入hdfs://ngram-test:9000/user/hduser/DocumentsPTOgrants2007_2011.seq ？

If you use the second one try the first one, and make sure that DocumentsPTOgrants2007_2011.seq is there. 如果使用第二个，请尝试第一个，并确保存在DocumentsPTOgrants2007_2011.seq 。

You can have a look at whether it is there by this command: hadoop dfs -ls ./ 您可以通过以下命令查看它是否存在： hadoop dfs -ls ./

Hadoop作业输入文件的位置

问题描述

1 个解决方案

解决方案1
0 2013-04-27 11:10:54

Hadoop作业输入文件的位置

问题描述

1 个解决方案

解决方案1 0 2013-04-27 11:10:54

解决方案1
0 2013-04-27 11:10:54