How to read new table with data in spark-shell SQL?

Question

I am new to spark shell and I am trying to add new table and read it. I have added this file:

workers.txt:

1201, satish, 25
1202, krishna, 28
1203, amith, 39
1204, javed, 23
1205, prudvi, 23

and run the commands:

spark-shell
val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
sqlContext.sql("CREATE TABLE workers (id INT, name VARCHAR(64), age INT)")
sqlContext.sql("LOAD DATA LOCAL INPATH 'workers.txt' INTO TABLE workers")
 >> res5: org.apache.spark.sql.DataFrame = []
val resultW = sqlContext.sql("FROM workers SELECT *")
>> resultW: org.apache.spark.sql.DataFrame = [id: int, name: string ... 1 more field]
resultW.show()
>>
+----+----+----+                                                                
|  id|name| age|
+----+----+----+
|null|null|null|
|null|null|null|
|null|null|null|
|null|null|null|
|null|null|null|
+----+----+----+

but as you see table has only nulls, why is that? The file workers.txt is in same working dirictory.

Answer 1

The reason you get null in your df is because spark:

The line separator handles all \r , \r\n and \n by default

that means your line such as

1201, satish, 25

is handled as a value for spark.

And spark tries to make it fit into your declared type Int, which is not possible since your line contains ',', space and chars. That's why it gives null value everywhere but it has right number of lines.

The best way for your case is to read it through SparkSession method:

spark.read.option("delimiter", ",").csv("./workers.txt").show()

you can then cast it to the Type you like or persist into a table.

How to read new table with data in spark-shell SQL?

Question

1 answers

solution1
0 2022-09-19 18:20:50

How to read new table with data in spark-shell SQL?

Question

1 answers

solution1 0 2022-09-19 18:20:50

solution1
0 2022-09-19 18:20:50