简体   繁体   English

Apache Spark-JDBC源

[英]Apache Spark - JDBC Sources

Did anyone managed to pull data out or at least connect to an RDBMS through JDBC with their new feature released in 1.3 using their built-in source for Spark SQL instead of RDDJdbc? 是否有人设法使用其内置的Spark SQL源而不是RDDJdbc通过JDBC使用1.3中发布的新功能通过JDBC提取数据或至少连接到RDBMS?

https://databricks.com/blog/2015/03/24/spark-sql-graduates-from-alpha-in-spark-1-3.html https://databricks.com/blog/2015/03/24/spark-sql-graduates-from-alpha-in-spark-1-3.html

I've tried to apply the example mentioned in the post above but that didn't work as it gives me an error. 我已经尝试应用上面文章中提到的示例,但这没有用,因为它给了我一个错误。 Thought maybe someone can provide me a full example in scala of how to connect and query the data. 以为也许有人可以在scala中为我提供有关如何连接和查询数据的完整示例。

Yes. 是。 There are two ways of doing it. 有两种方法。

  1. Programmatically using SQLContext load function. 以编程方式使用SQLContext加载功能。

The load function loads the JDBC datasource data as a DataFrame. 加载功能将JDBC数据源数据加载为DataFrame。 If you would like to make this DataFrame available as a table in subsequent Spark SQL queries, it has to be registered using 如果您希望将此DataFrame在以后的Spark SQL查询中用作表,则必须使用

yourDataFrame.registerTempTable("yourTableName") yourDataFrame.registerTempTable( “yourTableName”)

If you'd like a complete example, check my blog post . 如果您想要完整的示例,请查看我的博客文章

  1. Using SQL 使用SQL

I haven't tried this yet. 我还没有尝试过。 Based on what I read from documentation, this can be done like the example below. 根据我从文档中读取的内容,可以像下面的示例一样完成此操作。

CREATE TEMPORARY TABLE yourTableName
USING org.apache.spark.sql.jdbc
OPTIONS (
  url "jdbc:postgresql:dbserver",
  dbtable "schema.tablename"
)

As you can guess, it does both loading data as well as registering it as a table in the same command. 如您所料,它既可以加载数据,也可以在同一命令中将其注册为表。

I was thinking that SQLContext.sql function can be used to execute the above SQL DDL statement. 我当时认为SQLContext.sql函数可用于执行上述SQL DDL语句。 But it throws the same error you mentioned. 但这会引发您提到的相同错误。

failure: ''insert'' expected but identifier CREATE found 失败:预期为“插入”,但找到标识符CREATE

Based on all these, my conclusion now is that this DDL statement is meant to be executed from a SQL client when Spark acts a database to it. 基于所有这些,我现在的结论是,当Spark对其执行数据库操作时,该DDL语句应从SQL客户端执行。 That means if you connect Spark from SQL workbench or any other SQL editors using SparkThrift server, you could probably invoke this. 这意味着,如果您使用SparkThrift服务器从SQL工作台或任何其他SQL编辑器连接Spark,则可能会调用它。 If that is successful, you could also try to do it programmatically by using a JDBC/ODBC driver which then connects to Thrift server. 如果成功,您也可以尝试使用JDBC / ODBC驱动程序以编程方式进行操作,然后将其连接到Thrift服务器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM