简体   繁体   English

每次执行 toDF() 或 createDataFrame 时,scala spark 都会引发与 derby 相关的错误

[英]scala spark raises an error related to derby everytime when doing toDF() or createDataFrame

I am new to scala and scala-api spark and I tried scala-api spark recently on my own computer, which means I run the spark locally by setting SparkSession.builder().master("local[*]").我是 scala 和 scala-api spark 的新手,最近我在自己的计算机上尝试了 scala-api spark,这意味着我通过设置 SparkSession.builder().master("local[*]") 在本地运行 spark。 at first I succeeded in reading the text file using spark.sparkContext.textFile().起初我使用 spark.sparkContext.textFile() 成功读取了文本文件。 After having got the corresponding rdd, I tried convert the rdd to a spark DataFrame, but failed again and again.获得相应的rdd后,我尝试将rdd转换为spark DataFrame,但一次又一次失败。 To be specific, I used two methods, 1) toDF() and 2) spark.createDataFrame(), all failed, both two methods gave me similar error as shown below.具体来说,我使用了两种方法,1) toDF() 和 2) spark.createDataFrame(),都失败了,这两种方法都给了我类似的错误,如下所示。

2018-10-16 21:14:27 ERROR Schema:125 - Failed initialising database.
Unable to open a test connection to the given database. JDBC url = jdbc:derby:;databaseName=metastore_db;create=true, username = APP. Terminating connection pool (set lazyInit to true if you expect to start your database after your app). Original Exception: ------
java.sql.SQLException: Failed to start database 'metastore_db' with class loader 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@199549a5, see the next exception for details.
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)
at org.apache.derby.impl.jdbc.Util.seeNextException(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.bootDatabase(Unknown Source)
at org.apache.derby.impl.jdbc.EmbedConnection.<init>(Unknown Source)
at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
at org.apache.derby.jdbc.InternalDriver$1.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at org.apache.derby.jdbc.InternalDriver.getNewEmbedConnection(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)
at org.apache.derby.jdbc.InternalDriver.connect(Unknown Source)

I examined the error message, it seems that the errors are related to apache.derby and some connection to some database is failed.我检查了错误消息,似乎错误与 apache.derby 相关,并且与某些数据库的某些连接失败。 I do not know what JDBC is actually.我不知道 JDBC 实际上是什么。 I am somewhat familiar with pyspark and I have never been asked to configure any JDBC database, WHY SCALA-API SPARK need it?我对 pyspark 有点熟悉,我从来没有被要求配置任何 JDBC 数据库,为什么 SCALA-API SPARK 需要它? what should I do to avoid this error?我应该怎么做才能避免这个错误? why scala-api spark dataframe need JDBC or any database while scala-api spark RDD doesn't?为什么 scala-api spark 数据帧需要 JDBC 或任何数据库,而 scala-api spark RDD 不需要?

For future googler: I have googled for several hours and still have no idea about how to get rid of this error.对于未来的 googler:我已经搜索了几个小时,但仍然不知道如何摆脱这个错误。 But the origin of this problem is very clear: my sparksession enables the support for Hive which then need to specify the database.但是这个问题的根源很清楚:我的sparksession启用了对Hive的支持,然后需要指定数据库。 To solve this problem, we need to disable the support for Hive, since I am running spark on my own mac, it is ok to do this.为了解决这个问题,我们需要禁用对Hive的支持,因为我在自己的mac上运行spark,所以这样做是可以的。 So I download the spark source file and build it by myself using the command ./make-distribution.sh --name hadoop-2.6_scala-2.11 --tgz -Pyarn -Phadoop-2.6 -Dscala-2.11 -DskipTests omits -Phive -Phive-thriftserver.所以我下载了 spark 源文件并使用命令 ./make-distribution.sh --name hadoop-2.6_scala-2.11 --tgz -Pyarn -Phadoop-2.6 -Dscala-2.11 -DskipTests omits -Phive - 自己构建它Phive-thriftserver。 I tested self-built spark, and metastore_db folder has never been created and so fat so good.我测试了自建spark,从来没有创建过metastore_db文件夹,这么肥这么好。

For the detail, please refer to this post: Prebuilt Spark 2.1.0 creates metastore_db folder and derby.log when launching spark-shell详细请参考这篇文章: Prebuilt Spark 2.1.0 在启动 spark-shell 时会创建 metastore_db 文件夹和 derby.log

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Scala Spark - 调用createDataFrame时获取重载方法 - Scala Spark - Get Overloaded method when calling createDataFrame ubuntu scala ide - spark - toDF 方法错误值 toDF is not member of org.apache.spark.rdd.RDD[String] - ubuntu scala ide - spark - toDF Method error value toDF is not a member of org.apache.spark.rdd.RDD[String] Spark 2.0 Scala-RDD.toDF() - Spark 2.0 Scala - RDD.toDF() 使用来自 RDD [scala spark 2.4] 的 hiveContext.createDataFrame 的架构错误 - Schema error using hiveContext.createDataFrame from an RDD [scala spark 2.4] Spark Scala API:官方示例中spark.createDataFrame中没有typeTag - Spark Scala API: No typeTag available in spark.createDataFrame in official example Spark Scala RDD[Row] 到 Dataframe - 无法使用 toDF - Spark Scala RDD[Row] to Dataframe - using toDF not possible scala.collection.immutable.Iterable [org.apache.spark.sql.Row]到DataFrame吗? 错误:方法值重载createDataFrame及其它替代方法 - scala.collection.immutable.Iterable[org.apache.spark.sql.Row] to DataFrame ? error: overloaded method value createDataFrame with alternatives 使用toDF将DynamicFrame转换为Spark DataFrame时出错 - Getting error while converting DynamicFrame to a Spark DataFrame using toDF 使用 Scala 中的 createDataFrame 创建 dataframe 时的结构数据类型 - Struct data type when creating dataframe with createDataFrame in Scala Spark createDataFrame因ArrayOutOfBoundsException而失败 - Spark createDataFrame failing with ArrayOutOfBoundsException
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM