简体繁体 English

Apache Spark和Apache Apex有什么区别？

[英]What is the differences between Apache Spark and Apache Apex?

原文 2016-02-23 14:11:00 4 1 apache-spark/ machine-learning/ pyspark/ stream-processing/ apache-apex

Apache Apex - is an open source enterprise grade unified stream and batch processing platform. Apache Apex - 是一个开源的企业级统一流和批处理平台。 It is used in GE Predix platform for IOT. 它在GE Predix平台中用于物联网。 What are the key differences between these 2 platforms? 这两个平台之间的主要区别是什么？

Questions 问题

From a data science perspective, how is it different from Spark? 从数据科学的角度来看，它与Spark的不同之处是什么？
Does Apache Apex provide functionality like Spark MLlib? Apache Apex是否提供Spark MLlib等功能？ If we have to built scalable ML models on Apache apex how to do it & which language to use? 如果我们必须在Apache apex上构建可扩展的ML模型，该怎么做以及使用哪种语言？
Will data scientists have to learn Java to built scalable ML models? 数据科学家是否必须学习Java来构建可扩展的ML模型？ Does it have python API like pyspark? 它有像pyspark这样的python API吗？
Can Apache Apex be integrated with Spark and can we use Spark MLlib on top of Apex to built ML models? Apache Apex可以与Spark集成，我们可以在Apex之上使用Spark MLlib来构建ML模型吗？

1 个解决方案

Apache Apex an engine for processing streaming data. Apache Apex是一个处理流数据的引擎。 Some others which try to achieve the same are Apache storm, Apache flink. 尝试实现相同目标的其他一些是Apache风暴，Apache flink。 Differenting factor for Apache Apex is: it comes with built-in support for fault-tolerance, scalability and focus on operability which are key considerations in production use-cases. Apache Apex的不同因素是：它内置了对容错，可扩展性和可操作性的支持，这是生产用例中的关键考虑因素。

Comparing it with Spark: Apache Spark is actually a batch processing. 将它与Spark进行比较：Apache Spark实际上是一个批处理。 If you consider Spark streaming (which uses spark underneath) then it is micro-batch processing. 如果你考虑Spark流（它使用下面的spark），那么它就是微批处理。 In contrast, Apache apex is a true stream processing. 相比之下，Apache apex是一个真正的流处理。 In a sense that, incoming record does NOT have to wait for next record for processing. 从某种意义上说，传入记录不必等待下一个记录进行处理。 Record is processed and sent to next level of processing as soon as it arrives. 记录一经处理即被处理并发送到下一级处理。

Currently, work is under progress for adding support for integration of Apache Apex with machine learning libraries like Apache Samoa, H2O Refer https://issues.apache.org/jira/browse/SAMOA-49 目前，正在努力增加对Apache Apex与Apache Samoa，H2O等机器学习库集成的支持。请参阅https://issues.apache.org/jira/browse/SAMOA-49
Currently, it has support for Java, Scala. 目前，它支持Java，Scala。
https://www.datatorrent.com/blog/blog-writing-apache-apex-application-in-scala/ For Python, you may try it using Jython. https://www.datatorrent.com/blog/blog-writing-apache-apex-application-in-scala/对于Python，您可以使用Jython进行尝试。 But, I haven't not tried it myself. 但是，我自己并没有尝试过。 So, not very sure about it. 所以，不太确定。
Integration with Spark may not be good idea considering they are two different processing engines. 考虑到它们是两种不同的处理引擎，与Spark集成可能不是一个好主意。 But, Apache apex integration with Machine learning libraries is under progress. 但是，Apache apex与机器学习库的集成正在进行中。

If you have any other questions, requests for features you can post them on mailing list for apache apex users: https://mail-archives.apache.org/mod_mbox/incubator-apex-users/ 如果您有任何其他问题，可以在apache apex用户的邮件列表中发布功能请求： https ： //mail-archives.apache.org/mod_mbox/incubator-apex-users/