简体   繁体   English

外部非分区 Hive 表显示 0 条记录,即使 parquet 文件有数据,当使用 spark.read.parquet 读取时,它显示正确的数据

[英]External non-partitioned Hive table shows 0 records even if parquet file has data, when read using spark.read.parquet it shows correct data

I have an external non-partitioned hive table created on Azure DataBricks.我有一个在 Azure DataBricks 上创建的外部非分区 hive 表。

CREATE EXTERNAL TABLE reference.date_file(
  `date_init` string,
  `trade_date_minus_5` string,
  `trade_date_minus_4` string,
  `trade_date_minus_3` string,
  `trade_date_minus_2` string,
  `trade_date_minus_1` string,
  `trade_date` string,
  `trade_date_plus_1` string,
  `trade_date_plus_2` string,
  `trade_date_plus_3` string,
  `trade_date_plus_4` string,
  `trade_date_plus_5` string)
STORED AS PARQUET
LOCATION '/mnt/data/reference/data/DATE_FILE'
TBLPROPERTIES (
  'parquet.compress'='SNAPPY'
)

It also has parquet file present in the location which I copied from other location.我从其他位置复制的位置中也有镶木地板文件。

When I do select * from reference.date_file it returns 0 results even if data/records are present.当我select * from reference.date_file即使存在数据/记录,它也会返回 0 结果。

The same file when I do...当我这样做时,同一个文件...

val temp = spark.read.parquet("/mnt/data/reference/data/DATE_FILE")
temp.show()

... returns the records. ...返回记录。

Can someone please help what can be the issue here?有人可以帮忙解决这里的问题吗?

This is solved.这解决了。

I simply changed the location to some other location, re-created the table, restored the original location, recreated the table.我只是将位置更改为其他位置,重新创建表,恢复原始位置,重新创建表。 And it started showing the data.它开始显示数据。 May be there was some issue with the hive metadata. hive 元数据可能存在一些问题。 Changing the location and restoring corrected the metadata I guess.更改位置并恢复更正了我猜的元数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM