how to read orc transaction hive table in spark?

Question

How to read orc transaction hive table in spark?

I am facing issue while reading ORC transactional table through spark I get schema of hive table but not able to read actual data

See complete scenario :

hive> create table default.Hello(id int,name string) clustered by
(id) into 2 buckets STORED AS ORC TBLPROPERTIES
('transactional'='true');
   
hive> insert into default.hello values(10,'abc');

Now I am trying to access Hive Orc data from Spark sql but it show only schema

>spark.sql("select * from  hello").show()

Output: id,name

Answer 1

Yes as a workaround we can use compaction, but when the job is micro batch compaction won't help. so I decided to use a JDBC call. Please refer my answer for this issue in the below link or refer my GIT page - https://github.com/Gowthamsb12/Spark/blob/master/Spark_ACID

Please refer my answer for this issue

Answer 2

spark is not right now (2.3 version) fully compliant with hive transactional tables. The workaround is to make a compaction on the table after any transaction.

ALTER TABLE Hello COMPACT 'major';

This compaction should make you able to see the data. (after some time the data is compacted)

Answer 3

You would need to add an action at the end to force it to run the query:

spark.sql("Select * From Hello").show()

(The default here is to show 20 rows)

or

spark.sql("Select * From Hello").take(2)

to see 2 rows of output data.

These are just examples of actions that can be taken on a DataFrame.

how to read orc transaction hive table in spark?

Question

3 answers

solution1
2 2019-05-21 07:24:17

solution2
1 2018-07-10 18:43:16

solution3
-1 2018-05-09 14:49:25

how to read orc transaction hive table in spark?

Question

3 answers

solution1 2 2019-05-21 07:24:17

solution2 1 2018-07-10 18:43:16

solution3 -1 2018-05-09 14:49:25

solution1
2 2019-05-21 07:24:17

solution2
1 2018-07-10 18:43:16

solution3
-1 2018-05-09 14:49:25