如何在没有Hadoop的情况下运行Apache Crunch应用程序？

Question

I heard, that Apache Crunch is a facade and it can run applications without a Hadoop. 我听说Apache Crunch是一个外观，它可以在没有Hadoop的情况下运行应用程序。 Is this true? 这是真的？

If yes, then how to do that? 如果是，那该怎么做？

In Apache Crunch Getting Started the very first example includes hadoop command: 在Apache Crunch入门中，第一个示例包含hadoop命令：

$ hadoop jar target/crunch-demo-1.0-SNAPSHOT-job.jar <in> <out>

Is it possible to omit hadoop ? 可以省略hadoop吗？

Answer 1

Maybe you misunderstood that you don't need a Hadoop cluster . 也许您误会了您不需要Hadoop 集群。 Hive, Pig, Spark can all be ran locally, or filesystems other than HDFS. Hive，Pig，Spark都可以在本地运行，也可以在HDFS以外的文件系统上运行。

From as much as I can know about the library, you do, however, need the Hadoop API (which is what hadoop jar will load for you). 据我对库的了解，您确实需要Hadoop API（ hadoop jar将为您加载）。

In other words, you could set the input and output directories to a local file:// path to get around needing HDFS. 换句话说，您可以将输入和输出目录设置为本地file://路径，以解决需要HDFS的问题。

You can export CLASSPATH yourself to include Hadoop libraries, and run java jar to run the JAR 您可以自己export CLASSPATH以包括Hadoop库，并运行java jar来运行JAR

如何在没有Hadoop的情况下运行Apache Crunch应用程序？

问题描述

1 个解决方案

解决方案1
0 2018-05-24 00:52:04

如何在没有Hadoop的情况下运行Apache Crunch应用程序？

问题描述

1 个解决方案

解决方案1 0 2018-05-24 00:52:04

解决方案1
0 2018-05-24 00:52:04