[英]Not able to register RDD as TempTable
I am using IntelliJ and trying to get data from MySql DB and then write it into Hive table. 我正在使用IntelliJ并尝试从MySql DB获取数据,然后将其写入Hive表。 However I am not able to register my RDD to a temp table.
但是,我无法将RDD注册到临时表。 The error is "Cannot Resolve Symbol registerTempTable".
错误是“无法解析符号registerTempTable”。
I know that this issue is due to some imports missing but I am not able to find out which one. 我知道此问题是由于缺少某些进口造成的,但我无法找出是哪一个。
I have been stuck with this issue for quite a long time and tried all the options / answers available on stack overflow. 我已经长期困扰这个问题,并尝试了堆栈溢出时可用的所有选项/答案。
Below is my code: 下面是我的代码:
import java.sql.Driver
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.rdd.JdbcRDD
import java.sql.{Connection, DriverManager, ResultSet}
import org.apache.spark.sql.hive.HiveContext
object JdbcRddExample {
def main(args: Array[String]): Unit = {
val url = "jdbc:mysql://localhost:3306/retail_db"
val username = "retail_dba"
val password ="cloudera"
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val hiveContext = new HiveContext(sc)
import hiveContext.implicits._
Class.forName("com.mysql.jdbc.Driver").newInstance
val conf = new SparkConf().setAppName("JDBC RDD").setMaster("local[2]").set("spark.executor.memory","1g")
val sc = new SparkContext(conf)
val myRDD = new JdbcRDD( sc, () => DriverManager.getConnection(url,username,password) ,
"select department_id,department_name from departments limit ?,?",
0,999999999,1, r => r.getString("department_id") + ", " + r.getString("department_name"))
myRDD.registerTempTable("My_Table") // error: Not able to resolve registerTempTable
sqlContext.sql("use my_db")
sqlContext.sql("Create table my_db.depts (department_id INT, department_name String")
My SBT: (I believe I have imported all the artifacts) 我的SBT :(我相信我已经导入了所有工件)
name := "JdbcRddExample"
version := "0.1"
scalaVersion := "2.11.12"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-streaming
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.3.1" % "provided"
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.3.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.3.1" % "provided"
// https://mvnrepository.com/artifact/org.apache.spark/spark-streaming
libraryDependencies += "org.apache.spark" %% "spark-streaming" % "2.3.1" % "provided"
// https://mvnrepository.com/artifact/com.typesafe.scala-logging/scala-logging
libraryDependencies += "com.typesafe.scala-logging" %% "scala-logging" % "3.7.1"
// https://mvnrepository.com/artifact/org.apache.spark/spark-core
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.1"
libraryDependencies += "org.apache.logging.log4j" % "log4j-api" % "2.11.0"
libraryDependencies += "org.apache.logging.log4j" % "log4j-core" % "2.11.0"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.1"
libraryDependencies ++= Seq(
"org.apache.spark" %% "spark-core" % "2.3.1",
"org.apache.spark" %% "spark-sql" % "2.3.1",
"org.apache.spark" %% "spark-mllib" % "2.3.1",
"mysql" % "mysql-connector-java" % "5.1.12"
)
// https://mvnrepository.com/artifact/org.apache.spark/spark-hive
libraryDependencies += "org.apache.spark" %% "spark-hive" % "2.3.1" % "provided"
// https://mvnrepository.com/artifact/org.apache.spark/spark-sql
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.1"
Please point me to the exact imports that I missing. 请向我指出我缺少的确切进口商品。 Or is there an alternate way.
还是有另一种方法。 Like I mentioned before I have tried all the solutions and nothing has worked so far.
就像我在尝试所有解决方案之前所提到的那样,到目前为止没有任何工作。
To use Spark-sql, you probably need a DataFrame rather than a RDD, which obviously doesn't have the ability to registerTempTable
. 要使用Spark-sql,您可能需要一个DataFrame而不是RDD,这显然不具有
registerTempTable
。
You can quickly workaround by converting the RDD to a DataFrame, for example How to convert rdd object to dataframe in spark . 您可以通过将RDD转换为DataFrame来快速解决,例如, 如何在spark中将rdd对象转换为dataframe。 But it's recommended to use SparkSql feature to read JDBC datasource, like examples here .
但是建议使用SparkSql功能来读取JDBC数据源,例如此处的示例。 Sample code:
样例代码:
val dfDepartments = sqlContext.read.format("jdbc")
.option("url", url)
.option("driver", "com.mysql.jdbc.Driver")
.option("dbtable", "(select department_id,department_name from departments) t")
.option("user", username)
.option("password", password).load()
dfDepartments.createOrReplaceTempView("My_Table")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.