简体繁体 English

Apache Spark - 后端服务器

[英]Apache Spark - Backend servers

原文 2016-09-20 04:28:12 8 3 php/ apache-spark/ apache-spark-sql

I've developed a reporting application in PHP. 我用PHP开发了一个报表应用程序。 The application is built with HTML, CSS, javascript libraries, charting library(Highcharts) & MySQL to store data. 该应用程序使用HTML，CSS，javascript库，图表库（Highcharts）和MySQL来存储数据。 The user chooses some options in the front end & clicks a "Submit button". 用户在前端选择一些选项并单击“提交按钮”。 Then the PHP layer executes a bunch of required SQLs & sends json result back to the UI where the charting & data tables are drawn. 然后PHP层执行一堆必需的SQL并将json结果发送回绘制图表和数据表的UI。

The requirement now is, to be able to plug in a big data solution, Apache Spark to the existing application. 现在的要求是，能够将大数据解决方案Apache Spark插入现有应用程序。 I've been researching for the last 2 weeks on if I can in someway plug in the PHP application using REST API or some sort of Spark SQL driver to connect to Spark SQL server & execute the same set of SQLs that I have now, on the Spark SQL. 我已经研究了过去两周，如果我可以使用REST API或某种Spark SQL驱动程序插入PHP应用程序来连接到Spark SQL服务器并执行我现在拥有的同一组SQL， Spark SQL。 I haven't hit a solution yet. 我还没有找到解决方案。 I've now started researching on java based technologies such as Spring, others such as Angularjs, Nodejs other MVC frameworks to rewrite the project from scratch. 我现在开始研究基于Java的技术，比如Spring，其他如Angularjs，Nodejs等其他MVC框架，从头开始重写项目。 I'm not a big fan of java development as I'm not a hardcore developer.(I build some handy tools to get things done). 我不是Java开发的忠实粉丝，因为我不是一个核心开发人员。（我构建了一些方便的工具来完成工作）。

I did read this - https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-PHP , but looks like it's for a standalone spark installation. 我确实读过这个 - https://cwiki.apache.org/confluence/display/Hive/HiveClient#HiveClient-PHP ，但看起来像是一个独立的火花安装。 I'm dealing with a huge cluster in my case. 在我的情况下，我正在处理一个庞大的集群。

I'd highly appreciate any direction here please. 我非常感谢这里的任何方向。

3 个解决方案

Yes it can be done by using a hive context and spark sql thrift server in spark application. 是的，可以通过在spark应用程序中使用hive上下文和spark sql thrift服务器来完成。

you can run your spark application and do all the processing. 你可以运行你的火花应用程序并进行所有处理。 After processing if you are using a Data frame you have to just register it as a temporary table. 处理后如果使用数据框，则必须将其注册为临时表。

Now you can start a thrift server from the spark application. 现在，您可以从spark应用程序启动thrift服务器。

After starting the thrift server you can query the temporary table and get the results and insights using proper jdbc divers in PHP. 启动thrift服务器后，您可以查询临时表，并使用PHP中正确的jdbc divers获取结果和见解。

refer the link below for more details https://medium.com/@anicolaspp/apache-spark-as-a-distributed-sql-engine-4373e254e0f9#.ekc3cs28u 有关详细信息，请参阅以下链接https://medium.com/@anicolaspp/apache-spark-as-a-distributed-sql-engine-4373e254e0f9#.ekc3cs28u

This might not what you want. 这可能不是你想要的。 But if you consider using Scala to build it. 但是如果你考虑使用Scala来构建它。 Here is one possible solution. 这是一种可能的解决方案。

Having a web server which either using Spark standalone or connect to a cluster. 拥有使用Spark独立或连接到群集的Web服务器。
Using spark-highcharts to plot Spark DataFrame with highcharts 使用spark-highcharts使用highcharts绘制Spark DataFrame
Writing some code which accept some option from web and let it execute in the backend web server. 编写一些从Web接受某个选项的代码，并让它在后端Web服务器中执行。

Are you using any specific cluster, like cloudera or hortonworks? 您是否正在使用任何特定群集，例如cloudera或hortonworks？

In case of Cloudera, you should use Impala and corresponding JDBC drivers. 对于Cloudera，您应该使用Impala和相应的JDBC驱动程序。 In HDP, you should use Spark Thrift Server, with corresponding JDBC drivers. 在HDP中，您应该使用Spark Thrift Server以及相应的JDBC驱动程序。