标签[apache-crunch] - 堆栈内存溢出

如何将 Apache Crunch 的 output 写入 Amazon S3 存储桶 - How to write output of Apache Crunch to Amazon S3 bucket

有没有一种方法可以将我们的 Apache Crunch output 写入 S3 存储桶。在 crunch pipeline write 中有一个方法，它以 Target 作为参数。有没有办法将 S3 添加为目标来编写 crunch 方法。 ...

将 apache crunch Pcollection 写入多个 output 文件 - write a apache crunch Pcollection to multiple output files

我有一个紧缩的 dofn 生成一个 Pcollection，当前我将 pcollection 写入单个 avro 文件我想将 Pcollection 写入多个文件。 ...

由 GSSException 引起：未提供有效凭据（机制级别：找不到任何 Kerberos tgt） - Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

*在运行 apache crunch mapreduce pipleline 时，应该在 kerebro 中提供哪些凭据来解决此异常？通过 kinit 命令登录后没有区别。 * 日志如下： ...

如何在 Oozie 中执行一项特定的工作流操作。如果我手动杀死 Oozie 工作流程？ - How to execute one particular workflow action in Oozie. If I killed Oozie workflow manually?

我有以下 Oozie 工作流程，假设我在执行“Do_task1”操作时手动终止了该作业，但尽管手动终止了 oozie 作业（当操作“Do_task1”正在运行时），我仍然想执行操作“Do_task2”。我怎样才能做到这一点？ ...

Hadoop java.lang.RuntimeException：java.lang.NoSuchMethodException - Hadoop java.lang.RuntimeException: java.lang.NoSuchMethodException

我正在使用Apache Crunch编写一些map-reduce代码。我有一个以下类，其中包含一些在map-reduce代码中传递的数据，但是我遇到了一个异常-不知道为什么。这是类接口这是类的实现本身。（我这里有一个默认的空构造函数。）这是我在地图阶段得到的例外。 ...

Apache Crunch无法写入输出 - Apache crunch unable to write output

可能是疏忽大意，但我无法找出为什么Apache Crunch不会为我正在编写的用于学习Crunch的非常简单的程序将输出写到文件中的原因。这是代码：这是我使用hadoop执行此jar时看到的日志记录：输入文件非常简单，看起来像这样：尽管日志记录表明应该对输出 ...

Apache Crunch：如何设置多个输入路径？ - Apache Crunch: How to set multiple input paths?

我有一个问题：使用Apache Crunch时无法设置多个输入路径。我怎么解决这个问题？ ...

当调用Apache Crunch管道在两个不同的源上读取两次时会发生什么？ - What happens when calling Apache Crunch pipeline read twice on two different sources?

进行以下呼叫时：根据Apache Crunch阅读文档，用于从两个来源读取数据然后将数据连接在一起的管道是否相同？ ...

如何在没有Hadoop的情况下运行Apache Crunch应用程序？ - How to run Apache Crunch application without a Hadoop?

我听说Apache Crunch是一个外观，它可以在没有Hadoop的情况下运行应用程序。这是真的？如果是，那该怎么做？在Apache Crunch入门中，第一个示例包含hadoop命令：可以省略hadoop吗？ ...

尝试从IntelliJ运行项目时找不到或加载主类 - Could not find or load main class while trying to run project from IntelliJ

我已经下载了项目然后将其作为Maven现有项目导入到IntelliJ中。现在我正在尝试运行main功能，但失败并显示错误消息这是什么以及如何解决？ UPDATE 如果我从头开始创建新的Hello World Maven项目，那么它将起作用。更新2 ...

如何在具有“无效”数据类型的apache紧缩中定义DoFn？ - How could I define the DoFn in apache crunch having “void” data type?

基本上，我不需要DoFn的输出，只想为我在DoFn中获得的每条记录更新一些mysql数据库。那么如何定义具有无效数据类型的DoFn？基本上我不想从DoFn发出任何东西。 ...

在紧缩中遍历PTable - Iterating over PTable in crunch

我有以下PTable，对于上述somePTable2，我想为somePTable2中的每个记录创建一个新文件，是否有任何方法可以迭代somePTable2以使我可以访问该记录。我知道我可以在somePTable2上应用DoFn，但是可以应用DoFn中的pipeline.write（）操 ...

在Apache Crunch中是否存在将PCollection转换为PTable的通用方法？ - Is there a generic way of converting PCollection to PTable in Apache Crunch?

我在util类中有这些方法，它们将特定的PCollection转换为特定的PTable。如何实现上述方法的一种通用方法？ ...

使用以SparkSession实例开头的spark应用程序链接crunch spark管道 - Link crunch spark pipeline with spark application beginning with SparkSession instance

Crunch管道可以将Java spark context作为参数，但是如果spark应用程序以SparkSession实例启动（因为spark Java程序包含Datasets并且需要sparkSQL）。在这种情况下，如何在spark应用程序中添加另一层抽象（crunch管道）？ ...

java.lang.NoClassDefFoundError：org / apache / hadoop / hbase / mapreduce / MultiTableInputFormat - java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/mapreduce/MultiTableInputFormat

在Hadoop minicluster上运行mapReduce作业测试时，出现错误： java.lang.NoClassDefFoundError：org.apache.crunch.io.hbase.HBaseSourceTarget。（HBaseSourceTarget.java: ...

使用紧缩MemPipeline编写时出现java.lang.UnsatisfiedLinkError - java.lang.UnsatisfiedLinkError when writing using crunch MemPipeline

我正在使用com.cloudera.crunch版本：“ 0.3.0-3-cdh-5.2.1”。我有一个小程序，可以读取一些AVRO并根据某些条件过滤掉无效数据。我正在使用pipeline.write（PCollection，AvroFileTarget）写入无效的数据输出。在生 ...

Apache Crunch PTable collectValues如何在内部工作 - How does Apache Crunch PTable collectValues work internally

我正在浏览一些与HDFS架构和Apache紧缩PTable相关的文档。根据我的理解，当我们生成PTable时，数据将内部存储在HDFS中的Data节点之间。这意味着，如果我有带有<K1,V1>,<K2,V2>,<K1,V3>,<K3,V4&g ...

Hadoop作业：注入构造函数，JAXBException错误 - Hadoop Job: Error injecting constructor, JAXBException

在Apache Crunch管道中实现的MapReduce作业失败，并显示错误消息Error injecting constructor, javax.xml.bind.JAXBException: property "retainReferenceToInfo" is not supporte ...

如何将现有的MapReduce应用程序转换为Crunch？ - How to convert existing MapReduce applications to Crunch?

我实现了几个（大约一打）MapReduce任务，每个任务都是由一个简单的bash脚本执行的工作流的一部分。由于多种原因，我想将工作流程移至Apache Crunch。但是，我不清楚如何将我的MapReduce任务作为Crunch函数运行而不重新实现它们。有没有一种直接的方法可以将 ...

哪个工作映射减少可以做但 apache crunch 不能？ - which job map reduce can do but apache crunch can't?

我正在研究 apache 紧缩。据我所知，crunch 是一个基于 map-reduce 框架的抽象框架。我打算使用 crunch 而不是 map-reduce 框架。我的问题是 map-reduce 可以做什么而 crunch 不能？ ...