[英]Hadoop: File Copy with Cascading 2.5.1 and Hadoop 2.2.0
I have recently set up a pseudo-distributed hadoop 2.2.0 cluster on my Mac OSX following this guide . 我最近按照本指南在Mac OSX上设置了伪分布式hadoop 2.2.0集群。 Then, I tried the basic Cascading file copy with Cascading 2.5.1 However when I compiled the project using maven, I got the following error: 然后,我尝试使用Cascading 2.5.1进行基本级联文件复制。但是当我使用maven编译项目时,我收到以下错误:
[ERROR] /Users/david/IdeaProjects//CascadingIntro/src/main/java/com/example/CascadingIntro.java:[24,24]
cannot access org.apache.hadoop.mapred.JobConf
class file for org.apache.hadoop.mapred.JobConf not found
What am I doing wrong and how do I fix this? 我做错了什么,如何解决这个问题? I believe that Cascading 2.5.1 is compatible with Hadoop 2.2.0 from this page on Cascading.org. 我相信Cascading 2.5.1与Cascading.org上的这个页面的 Hadoop 2.2.0兼容。
My pom.xml is as follows: 我的pom.xml如下:
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>CascadingIntro</groupId>
<artifactId>CascadingIntro</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<repositories>
<repository>
<id>conjars.org</id>
<url>http://conjars.org/repo</url>
</repository>
</repositories>
<dependencies>
<dependency>
<groupId>cascading</groupId>
<artifactId>cascading-core</artifactId>
<version>2.5.1</version>
</dependency>
<dependency>
<groupId>cascading</groupId>
<artifactId>cascading-hadoop</artifactId>
<version>2.5.1</version>
</dependency>
</dependencies>
<build>
<finalName>CascadingIntro</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.0</version>
<configuration>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<version>2.4</version>
<configuration>
<archive>
<manifest>
<mainClass>com.example.CascadingIntro</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
</plugins>
</build>
</project>
And in my CascadingIntro class: 在我的CascadingIntro类中:
package com.example;
import cascading.flow.FlowDef;
import cascading.flow.hadoop.HadoopFlowConnector;
import cascading.pipe.Pipe;
import cascading.property.AppProps;
import cascading.scheme.hadoop.TextDelimited;
import cascading.tap.Tap;
import cascading.tap.hadoop.Hfs;
import java.util.Properties;
public class CascadingIntro {
public static void main(String[] args) {
Properties properties = new Properties();
AppProps.setApplicationJarClass( properties, CascadingIntro.class );
HadoopFlowConnector flowConnector = new HadoopFlowConnector( properties );
String inputPath = args[0];
Tap inputTap = new Hfs(new TextDelimited(true,"\t"), inputPath);
String outputPath = args[1];
Tap outputTap = new Hfs(new TextDelimited(true,"\t"),outputPath);
Pipe copyPipe = new Pipe("copy");
FlowDef flowDef = FlowDef
.flowDef()
.addSource(copyPipe,inputTap)
.addTailSink(copyPipe,outputTap);
flowConnector.connect(flowDef).complete();
}
}
You need to add hadoop-client to your dependencies: 您需要将hadoop-client添加到您的依赖项:
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>2.2.0</version>
</dependency>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.