简体   繁体   English

Spark SQL SQLContext

[英]Spark sql SQLContext

I'm trying to select data from MSSQL database via SQLContext.sql in Spark application. 我正在尝试通过Spark应用程序中的SQLContext.sql从MSSQL数据库中选择数据。 Connection works but I'm not able to select data from table, because it always fail on table name. 连接有效,但是我无法从表中选择数据,因为它总是在表名上失败。

Here is my code: 这是我的代码:

val prop=new Properties()
  val url2="jdbc:jtds:sqlserver://servername;instance=MSSQLSERVER;user=sa;password=Pass;"
  prop.setProperty("user","username")
  prop.setProperty("driver" , "net.sourceforge.jtds.jdbc.Driver")
  prop.setProperty("password","mypassword")
  val test=sqlContext.read.jdbc(url2,"[dbName].[dbo].[Table name]",prop)

sqlContext.sql("""
SELECT *
FROM 'dbName.dbo.Table name'
                 """)

I tried table name without (') or [dbName].[dbo].[Table name] but still the same .... 我尝试了不带(')[dbName].[dbo].[Table name]但仍然相同....

Exception in thread "main" java.lang.RuntimeException: [3.14] failure: ``union'' expected but `.' 线程“主”中的异常java.lang.RuntimeException:[3.14]故障:预计会出现“联合”,但会出现“。” found 发现

dependencies: 依赖关系:

// https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11
libraryDependencies += "org.apache.spark" % "spark-core_2.10" % "1.6.1" //%"provided"

// https://mvnrepository.com/artifact/com.datastax.spark/spark-cassandra-connector_2.10
libraryDependencies += "com.datastax.spark" % "spark-cassandra-connector_2.10" % "1.6.0"

// https://mvnrepository.com/artifact/org.apache.spark/spark-sql_2.10
libraryDependencies += "org.apache.spark" % "spark-sql_2.10" % "1.6.1" //%"provided"

I think the problem in your code is that the query you pass to sqlContext has no access to the original table in the source database. 我认为您代码中的问题是传递给sqlContext的查询无法访问源数据库中的原始表。 It has only access to the tables saved within the sql context, for example with df.write.saveAsTable() or with df.registerTempTable() ( df.createTempView in Spark 2+). 它只能访问sql上下文中保存的表,例如,使用df.write.saveAsTable()df.registerTempTable() (在Spark 2+中为df.createTempView )。

So, in your specific case, I can suggest a couple of options: 因此,在您的特定情况下,我可以建议几个选择:

1) if you want the query to be executed on the source database with the exact syntax of your database SQL, you can pass the query to the "dbtable" argument: 1)如果您希望使用数据库SQL的确切语法在源数据库上执行查询,则可以将查询传递给“ dbtable”参数:

val query = "SELECT * FROM dbName.dbo.TableName"
val df = sqlContext.read.jdbc(url2, s"($query) AS subquery", prop)

df.show

Note that the query needs to be in parentheses, because it will be passed to a "FROM" clause, as specified in the docs : 请注意,查询需要用括号括起来,因为它将传递给docs中指定的“ FROM”子句:

dbtable: The JDBC table that should be read. dbtable:应该读取的JDBC表。 Note that anything that is valid in a FROM clause of a SQL query can be used. 注意,可以使用在SQL查询的FROM子句中有效的任何东西。 For example, instead of a full table you could also use a subquery in parentheses. 例如,除了完整表之外,您还可以在括号中使用子查询。

2) If you don't need to run the query on the source database, you can just pass the table name and then create a temp view in the sqlContext: 2)如果不需要在源数据库上运行查询,则只需传递表名,然后在sqlContext中创建一个临时视图:

val table = sqlContext.read.jdbc(url2, "dbName.dbo.TableName", prop)
table.registerTempTable("temp_table")

val df = sqlContext.sql("SELECT * FROM temp_table")
// or sqlContext.table("temp_table")
df.show()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM