简体   繁体   English

如何在没有Hadoop的情况下运行Apache Crunch应用程序?

[英]How to run Apache Crunch application without a Hadoop?

I heard, that Apache Crunch is a facade and it can run applications without a Hadoop. 我听说Apache Crunch是一个外观,它可以在没有Hadoop的情况下运行应用程序。 Is this true? 这是真的?

If yes, then how to do that? 如果是,那该怎么做?

In Apache Crunch Getting Started the very first example includes hadoop command: Apache Crunch入门中 ,第一个示例包含hadoop命令:

$ hadoop jar target/crunch-demo-1.0-SNAPSHOT-job.jar <in> <out>

Is it possible to omit hadoop ? 可以省略hadoop吗?

Maybe you misunderstood that you don't need a Hadoop cluster . 也许您误会了您不需要Hadoop 集群 Hive, Pig, Spark can all be ran locally, or filesystems other than HDFS. Hive,Pig,Spark都可以在本地运行,也可以在HDFS以外的文件系统上运行。

From as much as I can know about the library, you do, however, need the Hadoop API (which is what hadoop jar will load for you). 据我对库的了解,您确实需要Hadoop API( hadoop jar将为您加载)。

In other words, you could set the input and output directories to a local file:// path to get around needing HDFS. 换句话说,您可以将输入和输出目录设置为本地file://路径,以解决需要HDFS的问题。

You can export CLASSPATH yourself to include Hadoop libraries, and run java jar to run the JAR 您可以自己export CLASSPATH以包括Hadoop库,并运行java jar来运行JAR

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM