简体繁体 English

Spark-SQL CLI中可用的表在ThriftServer上不可用

[英]tables available in Spark-SQL CLI are not available over thriftserver

原文 2017-01-15 23:54:39 8 1 apache-spark/ apache-spark-sql/ pyspark-sql

I'm trying to expose my spark-sql tables over JDBC via thriftserver but even though it looks like i've successfully connected, its not working. 我正在尝试通过thriftserver在JDBC上公开我的spark-sql表，但是即使我看起来已经成功连接，它也无法正常工作。 Here's what I've tried so far. 到目前为止，这是我尝试过的。

database setup: 数据库设置：

in pyspark I loaded a parquet file, created a temp view as tableX 在pyspark中，我加载了一个实木复合地板文件，并创建了一个临时视图为tableX
performed a .saveAsTable as hive_tableX 执行一个.saveAsTable作为hive_tableX
then I queried that table: spark.sql("SELECT * FROM hive_tableX LIMIT 1").show() which returned some data 然后我查询了该表： spark.sql("SELECT * FROM hive_tableX LIMIT 1").show()返回了一些数据
at this point, my code is saving table information to the hivestore, right? 此时，我的代码将表信息保存到hivestore中，对吗？

querying from spark-sql: 从spark-sql查询：

I then ran spark-sql and the spark sql shell started up 然后，我运行了spark-sql ，启动了spark sql shell
USE default
show tables; --> i see my table in there, hive_tableX ->我在那看到我的桌子hive_tableX
SELECT * FROM hive_tableX LIMIT 1 and I see some successful results. SELECT * FROM hive_tableX LIMIT 1 ，我看到了一些成功的结果。
thus, I believe it is now verified that my table has saved in the hive metastore, right? 因此，我相信现在可以验证我的表已保存在配置单元metastore中，对吗？

then I turn on thriftserver 然后我打开Thriftserver

./sbin/start-thriftserver.sh

next, I turn on beeline so I can test the thriftserver connection 接下来，我打开beeline，以便可以测试thriftserver连接

!connect jdbc:hive2://localhost:10000 (and enter username and password) !connect jdbc:hive2://localhost:10000 （并输入用户名和密码）
then I select the default db: use default; 然后我选择默认数据库： use default;
and show tables; 并show tables; --> there's nothing there. ->那里什么都没有。

So, where are my tables? 那么，我的桌子在哪里？ is beeline or thrift pointing to a different warehouse or something? 是直线还是节俭指向另一个仓库或其他东西？

Edit: I think my thriftserver isn't using the right warehouse directory, so I'm trying to start it with a config option: 编辑：我认为我的thriftserver没有使用正确的仓库目录，所以我正在尝试使用config选项启动它：

[still nothing] sbin/start-thriftserver.sh --hiveconf spark.sql.warehouse.dir=/code/spark/thrift/spark-warehouse [仍然没有] sbin / start-thriftserver.sh --hiveconf spark.sql.warehouse.dir = / code / spark / thrift / spark-househouse
[still nothing] sbin/start-thriftserver.sh --conf spark.sql.warehouse.dir=/code/spark/thrift/spark-warehouse [仍然没有] sbin / start-thriftserver.sh --conf spark.sql.warehouse.dir = / code / spark / thrift / spark-househouse

Edit: starting it in the same physical directory as where the wherehouse was created seems to do the trick. 编辑：在创建wherehouse所在的相同物理目录中启动它似乎可以解决问题。 Although, I don't know how to programmatically set the path to something else and start it elsewhere. 虽然，我不知道如何以编程方式设置其他路径并在其他地方启动。

1 个解决方案

the solution to this particular problem was that I was starting thrift from a different directory than the spark-wherehouse and metastore_db were located. 解决此特定问题的方法是，我从不同于spark-wherehouse和metastore_db所在的目录开始节俭。

Once I started it from the correct directory, it worked as expected and my tables were now available. 一旦从正确的目录启动它，它就会按预期工作，并且我的表现在可用。