Scala/Spark 判断外部表的路径

Question

I am having one external table on on gs bucket and to do some compaction logic, I want to determine the full path on which the table is created.我在 gs 存储桶上有一个外部表并执行一些压缩逻辑，我想确定创建该表的完整路径。

val tableName="stock_ticks_cow_part"
val primaryKey="key"
val versionPartition="version"
val datePartition="dt"
val datePartitionCol=new org.apache.spark.sql.ColumnName(datePartition)

import spark.implicits._

val compactionTable = spark.table(tableName).withColumnRenamed(versionPartition, "compaction_version").withColumnRenamed(datePartition, "date_key")
compactionTable. <code for determining the path>

Let me know if anyone knows how to determine the table path in scala.让我知道是否有人知道如何确定 Scala 中的表路径。

Answer 1

I think you can use .inputFiles to我认为你可以使用.inputFiles来

Returns a best-effort snapshot of the files that compose this Dataset返回组成此数据集的文件的尽力而为的快照

Be aware that this returns an Array[String] , so you should loop through it to get all information you're looking for.请注意，这将返回一个Array[String] ，因此您应该遍历它以获取您正在寻找的所有信息。

So actually just call所以实际上只是打电话

compactionTable.inputFiles

and look at each element of the Array并查看数组的每个元素

Answer 2

Here is the correct answer:以下是正确答案：


import org.apache.spark.sql.catalyst.TableIdentifier
lazy val tblMetadata = catalog.getTableMetadata(new TableIdentifier(tableName,Some(schema)))

lazy val s3location: String = tblMetadata.location.getPath

Answer 3

You can use SQL commands SHOW CREATE TABLE <tablename> or DESCRIBE FORMATTED <tablename> .您可以使用 SQL 命令SHOW CREATE TABLE <tablename>或DESCRIBE FORMATTED <tablename> 。 Both should return the location of the external table, but they need some logic to extract this path...两者都应该返回外部表的location ，但是它们需要一些逻辑来提取此路径...

See also How to get the value of the location for a Hive table using a Spark object?另请参阅如何使用 Spark 对象获取 Hive 表的位置值？

Answer 4

Use the DESCRIBE FORMATTED SQL command and collect the path back to the driver.使用DESCRIBE FORMATTED SQL 命令并收集返回驱动程序的路径。

In Scala:在斯卡拉：

val location = spark.sql("DESCRIBE FORMATTED table_name").filter("col_name = 'Location'").select("data_type").head().getString(0)

The same in Python:在 Python 中也是如此：

location = spark.sql("DESCRIBE FORMATTED table_name").filter("col_name = 'Location'").select("data_type").head()[0]

Scala/Spark 判断外部表的路径

问题描述

4 个解决方案

解决方案1
3 已采纳 2019-03-08 07:39:51

解决方案2
1 2019-05-07 02:23:13

解决方案3
0 2019-03-08 15:40:19

解决方案4
0 2020-08-28 13:50:33

Scala/Spark 判断外部表的路径

问题描述

4 个解决方案

解决方案1 3 已采纳 2019-03-08 07:39:51

解决方案2 1 2019-05-07 02:23:13

解决方案3 0 2019-03-08 15:40:19

解决方案4 0 2020-08-28 13:50:33

解决方案1
3 已采纳 2019-03-08 07:39:51

解决方案2
1 2019-05-07 02:23:13

解决方案3
0 2019-03-08 15:40:19

解决方案4
0 2020-08-28 13:50:33