简体   繁体   English

Spark-SQL CLI中可用的表在ThriftServer上不可用

[英]tables available in Spark-SQL CLI are not available over thriftserver

I'm trying to expose my spark-sql tables over JDBC via thriftserver but even though it looks like i've successfully connected, its not working. 我正在尝试通过thriftserver在JDBC上公开我的spark-sql表,但是即使我看起来已经成功连接,它也无法正常工作。 Here's what I've tried so far. 到目前为止,这是我尝试过的。

database setup: 数据库设置:

  • in pyspark I loaded a parquet file, created a temp view as tableX 在pyspark中,我加载了一个实木复合地板文件,并创建了一个临时视图为tableX
  • performed a .saveAsTable as hive_tableX 执行一个.saveAsTable作为hive_tableX
  • then I queried that table: spark.sql("SELECT * FROM hive_tableX LIMIT 1").show() which returned some data 然后我查询了该表: spark.sql("SELECT * FROM hive_tableX LIMIT 1").show()返回了一些数据
  • at this point, my code is saving table information to the hivestore, right? 此时,我的代码将表信息保存到hivestore中,对吗?

querying from spark-sql: 从spark-sql查询:

  • I then ran spark-sql and the spark sql shell started up 然后,我运行了spark-sql ,启动了spark sql shell
  • USE default
  • show tables; --> i see my table in there, hive_tableX ->我在那看到我的桌子hive_tableX
  • SELECT * FROM hive_tableX LIMIT 1 and I see some successful results. SELECT * FROM hive_tableX LIMIT 1 ,我看到了一些成功的结果。
  • thus, I believe it is now verified that my table has saved in the hive metastore, right? 因此,我相信现在可以验证我的表已保存在配置单元metastore中,对吗?

then I turn on thriftserver 然后我打开Thriftserver

  • ./sbin/start-thriftserver.sh

next, I turn on beeline so I can test the thriftserver connection 接下来,我打开beeline,以便可以测试thriftserver连接

  • !connect jdbc:hive2://localhost:10000 (and enter username and password) !connect jdbc:hive2://localhost:10000 (并输入用户名和密码)
  • then I select the default db: use default; 然后我选择默认数据库: use default;
  • and show tables; show tables; --> there's nothing there. ->那里什么都没有。

So, where are my tables? 那么,我的桌子在哪里? is beeline or thrift pointing to a different warehouse or something? 是直线还是节俭指向另一个仓库或其他东西?

Edit: I think my thriftserver isn't using the right warehouse directory, so I'm trying to start it with a config option: 编辑:我认为我的thriftserver没有使用正确的仓库目录,所以我正在尝试使用config选项启动它:

  • [still nothing] sbin/start-thriftserver.sh --hiveconf spark.sql.warehouse.dir=/code/spark/thrift/spark-warehouse [仍然没有] sbin / start-thriftserver.sh --hiveconf spark.sql.warehouse.dir = / code / spark / thrift / spark-househouse
  • [still nothing] sbin/start-thriftserver.sh --conf spark.sql.warehouse.dir=/code/spark/thrift/spark-warehouse [仍然没有] sbin / start-thriftserver.sh --conf spark.sql.warehouse.dir = / code / spark / thrift / spark-househouse

Edit: starting it in the same physical directory as where the wherehouse was created seems to do the trick. 编辑:在创建wherehouse所在的相同物理目录中启动它似乎可以解决问题。 Although, I don't know how to programmatically set the path to something else and start it elsewhere. 虽然,我不知道如何以编程方式设置其他路径并在其他地方启动。

the solution to this particular problem was that I was starting thrift from a different directory than the spark-wherehouse and metastore_db were located. 解决此特定问题的方法是,我从不同于spark-wherehousemetastore_db所在的目录开始节俭。

Once I started it from the correct directory, it worked as expected and my tables were now available. 一旦从正确的目录启动它,它就会按预期工作,并且我的表现在可用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM