![](/img/trans.png)
[英]Using Spark 3.2 to ingest IoT data into delta lake continuously
[英]Not able to get metadata information of the Delta Lake table using Spark
我正在尝试获取使用DataFrame创建的Delta Lake表的元数据信息。 有关版本,时间戳的信息。
尝试过: spark.sql("describe deltaSample").show(10,false)
-这没有提供与版本和时间戳有关的信息:
我想知道时间戳记信息存在多少个版本
+--------+---------+-------+
|col_name|data_type|comment|
+--------+---------+-------+
|_c0 |string |null |
|_c1 |string |null |
+--------+---------+-------+
下面是代码://在spark-shell中下载delta
spark2-shell --packages io.delta:delta-core_2.11:0.2.0
val data = spark.read.csv("/xyz/deltaLake/deltaLakeSample.csv")
//保存数据框
data.write.format("delta").save("/xyz/deltaLake/deltaSample")
//创建三角洲湖泊表
spark.sql("create table deltaSample using delta location '/xyz/deltaLake/deltaSample'")
val updatedInfo = data.withColumn("_c1",when(col("_c1").equalTo("right"), "updated").otherwise(col("_c1")) )
//更新三角洲湖泊表
updatedInfo.write.format("delta").mode("overwrite").save("/xyz/deltaLake/deltaSample")
spark.read.format("delta").option("versionAsOf", 0).load("/xyz/deltaLake/deltaSample/").show(10,false)
+---+-----+
|_c0|_c1 |
+---+-----+
|rt |right|
|lt |left |
|bk |back |
|frt|front|
+---+-----+
spark.read.format("delta").option("versionAsOf", 1).load("/xyz/deltaLake/deltaSample/").show(10,false)
+---+-------+
|_c0|_c1 |
+---+-------+
|rt |updated|
|lt |left |
|bk |back |
|frt|front |
+---+-------+
//获取所创建表的元数据。 带有版本,时间戳信息。
spark.sql("describe history deltaSample") -- not working
org.apache.spark.sql.AnalysisException: Table or view was not found: history;
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:47)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveRelations$$lookupTableFromCatalog(Analyzer.scala:733)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:685)
预期的表显示(例如:添加的列版本,时间戳):
+--------+---------+-------+-------+------------
|_c0 |_c1 |Version|timestamp |
+--------+---------+-------+-------+------------
|rt |right |0 |2019-07-22 00:24:00|
|lt |left |0 |2019-07-22 00:24:00|
|rt |updated |1 |2019-08-22 00:25:60|
|lt |left |1 |2019-08-22 00:25:60|
+--------+---------+-------+------------------+
查看Delta Lake表历史的功能已包括在最近宣布的Delta Lake 0.3.0版本中宣布的0.3.0中 。
当前,您可以使用Scala API进行此操作; 目前有能力在SQL中进行此操作。 对于具有0.3.0
的Scala API示例,
import io.delta.tables._
val deltaTable = DeltaTable.forPath(spark, pathToTable)
val fullHistoryDF = deltaTable.history() // get the full history of the table.
val lastOperationDF = deltaTable.history(1) // get the last operation.
结果fullHistoryDF
类似于:
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+
|version| timestamp|userId|userName|operation| operationParameters| job|notebook|clusterId|readVersion|isolationLevel|isBlindAppend|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+
| 5|2019-07-29 14:07:47| null| null| DELETE|[predicate -> ["(...|null| null| null| 4| null| false|
| 4|2019-07-29 14:07:41| null| null| UPDATE|[predicate -> (id...|null| null| null| 3| null| false|
| 3|2019-07-29 14:07:29| null| null| DELETE|[predicate -> ["(...|null| null| null| 2| null| false|
| 2|2019-07-29 14:06:56| null| null| UPDATE|[predicate -> (id...|null| null| null| 1| null| false|
| 1|2019-07-29 14:04:31| null| null| DELETE|[predicate -> ["(...|null| null| null| 0| null| false|
| 0|2019-07-29 14:01:40| null| null| WRITE|[mode -> ErrorIfE...|null| null| null| null| null| true|
+-------+-------------------+------+--------+---------+--------------------+----+--------+---------+-----------+--------------+-------------+
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.