Apache Spark-JDBC源

Question

Did anyone managed to pull data out or at least connect to an RDBMS through JDBC with their new feature released in 1.3 using their built-in source for Spark SQL instead of RDDJdbc? 是否有人设法使用其内置的Spark SQL源而不是RDDJdbc通过JDBC使用1.3中发布的新功能通过JDBC提取数据或至少连接到RDBMS？

https://databricks.com/blog/2015/03/24/spark-sql-graduates-from-alpha-in-spark-1-3.html https://databricks.com/blog/2015/03/24/spark-sql-graduates-from-alpha-in-spark-1-3.html

I've tried to apply the example mentioned in the post above but that didn't work as it gives me an error. 我已经尝试应用上面文章中提到的示例，但这没有用，因为它给了我一个错误。 Thought maybe someone can provide me a full example in scala of how to connect and query the data. 以为也许有人可以在scala中为我提供有关如何连接和查询数据的完整示例。

Answer 1

Yes. 是。 There are two ways of doing it. 有两种方法。

Programmatically using SQLContext load function. 以编程方式使用SQLContext加载功能。

The load function loads the JDBC datasource data as a DataFrame. 加载功能将JDBC数据源数据加载为DataFrame。 If you would like to make this DataFrame available as a table in subsequent Spark SQL queries, it has to be registered using 如果您希望将此DataFrame在以后的Spark SQL查询中用作表，则必须使用

yourDataFrame.registerTempTable("yourTableName") yourDataFrame.registerTempTable（ “yourTableName”）

If you'd like a complete example, check my blog post . 如果您想要完整的示例，请查看我的博客文章。

Using SQL 使用SQL

I haven't tried this yet. 我还没有尝试过。 Based on what I read from documentation, this can be done like the example below. 根据我从文档中读取的内容，可以像下面的示例一样完成此操作。

CREATE TEMPORARY TABLE yourTableName
USING org.apache.spark.sql.jdbc
OPTIONS (
  url "jdbc:postgresql:dbserver",
  dbtable "schema.tablename"
)

As you can guess, it does both loading data as well as registering it as a table in the same command. 如您所料，它既可以加载数据，也可以在同一命令中将其注册为表。

I was thinking that SQLContext.sql function can be used to execute the above SQL DDL statement. 我当时认为SQLContext.sql函数可用于执行上述SQL DDL语句。 But it throws the same error you mentioned. 但这会引发您提到的相同错误。

failure: ''insert'' expected but identifier CREATE found 失败：预期为“插入”，但找到标识符CREATE

Based on all these, my conclusion now is that this DDL statement is meant to be executed from a SQL client when Spark acts a database to it. 基于所有这些，我现在的结论是，当Spark对其执行数据库操作时，该DDL语句应从SQL客户端执行。 That means if you connect Spark from SQL workbench or any other SQL editors using SparkThrift server, you could probably invoke this. 这意味着，如果您使用SparkThrift服务器从SQL工作台或任何其他SQL编辑器连接Spark，则可能会调用它。 If that is successful, you could also try to do it programmatically by using a JDBC/ODBC driver which then connects to Thrift server. 如果成功，您也可以尝试使用JDBC / ODBC驱动程序以编程方式进行操作，然后将其连接到Thrift服务器。

Apache Spark-JDBC源

问题描述

1 个解决方案

解决方案1
0 2015-03-28 18:50:04

Apache Spark-JDBC源

问题描述

1 个解决方案

解决方案1 0 2015-03-28 18:50:04

解决方案1
0 2015-03-28 18:50:04