简体   繁体   中英

how to read orc transaction hive table in spark?

How to read orc transaction hive table in spark?

I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data

See complete scenario :

hive> create table default.Hello(id int,name string) clustered by
(id) into 2 buckets STORED AS ORC TBLPROPERTIES
('transactional'='true');
   
hive> insert into default.hello values(10,'abc');

Now I am trying to access Hive Orc data from Spark sql but it show only schema

>spark.sql("select * from  hello").show()  

Output: id,name

Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID

Please refer my answer for this issue

spark is not right now (2.3 version) fully compliant with hive transactional tables. The workaround is to make a compaction on the table after any transaction.

ALTER TABLE Hello COMPACT 'major';

This compaction should make you able to see the data. (after some time the data is compacted)

You would need to add an action at the end to force it to run the query:

spark.sql("Select * From Hello").show()

(The default here is to show 20 rows)

or

spark.sql("Select * From Hello").take(2)

to see 2 rows of output data.

These are just examples of actions that can be taken on a DataFrame.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM