简体   繁体   中英

Test Spark Sql Queries Local

Recently I am working in a Spark Application and as part of project the dataset is read from HBase Server and Spark sql modifies the data read and saves to Kafka.

The problem I am facing is I can't test spark.sql locally. Every time I have to submit the application jar and run in Server. In Sql we have tools to test all the queries in local environment.

Is there a way or other tools where I can test spark sql locally by reading data from HBase?

I tried hbaseExplorer but it does not solve the problem.

Thanks,

If you are talking about unit testing your Spark SQL queries. You can always create DataSet locally and run queries against them

scala> val df = List(( 1 , false , 1.0 ),
 |         (2 , true , 2.0 )
 |         ).toDF("col1", "col2","col3" )
 df: org.apache.spark.sql.DataFrame = [col1: int, col2: boolean ... 1 more field]
 scala> df.registerTempTable("myTable")
 scala> sql("select sum(col3) from myTable").show
 +---------+
 |sum(col3)|
 +---------+
 |      3.0|
 +---------+

Using Apache Phoenix

If you have access to Apache Phoenix , Open spark-shell in your local and connect to Apache Phoenix using JDBC connection details.

Using Direct Connection to HBase You can also connect HBase directly from your local spark-shell , It is somewhat difficult if your cluster is secured or kerbrose enabled.

Using Export Sample Data (easy way & will save lot of time also.)

For testing purpose,

  1. Export sample data from your HBase into json or csv or any other formats you like
  2. Download that data into your local system.
  3. Use your spark shell to create table with the same structure of your HBase table using this command - spark.sql('CREATE TABLE HbaseTable ..')
  4. Load downloaded sample data into DataFrame
  5. Write DataFrame data to newly created table.

Check below steps for your reference.


/tmp/spark > ls -ltr
total 0
drwxr-xr-x  14 srinivas  wheel  448 Nov 20 02:45 data
/tmp/spark > ls -ltr data
total 40
-rw-r--r--  1 srinivas  wheel  9 Nov 20 02:45 part-00000-4f5f5245-f664-426b-8204-a981871a1205-c000.json
-rw-r--r--  1 srinivas  wheel  9 Nov 20 02:45 part-00004-4f5f5245-f664-426b-8204-a981871a1205-c000.json
-rw-r--r--  1 srinivas  wheel  9 Nov 20 02:45 part-00002-4f5f5245-f664-426b-8204-a981871a1205-c000.json
-rw-r--r--  1 srinivas  wheel  9 Nov 20 02:45 part-00003-4f5f5245-f664-426b-8204-a981871a1205-c000.json
-rw-r--r--  1 srinivas  wheel  9 Nov 20 02:45 part-00001-4f5f5245-f664-426b-8204-a981871a1205-c000.json

Open spark-shell in path /tmp/spark

/tmp/spark > spark-shell

scala> val df = spark.read.json("/tmp/spark/data")
df: org.apache.spark.sql.DataFrame = [id: bigint]

scala> spark.sql("create table HBaseTable(id int) stored as orc")
res0: org.apache.spark.sql.DataFrame = []

scala> df.write.insertInto("HbaseTable")

scala> spark.sql("select * from HbaseTable").show(false)
+---+
|id |
+---+
|4  |
|3  |
|1  |
|5  |
|2  |
+---+
scala> :q
/tmp/spark > ls -ltr
total 8
drwxr-xr-x  14 srinivas  wheel  448 Nov 20 02:45 data
-rw-r--r--   1 srinivas  wheel  700 Nov 20 02:45 derby.log
drwxr-xr-x   9 srinivas  wheel  288 Nov 20 02:45 metastore_db
drwxr-xr-x   3 srinivas  wheel   96 Nov 20 02:46 spark-warehouse
/tmp/spark > ls -ltr spark-warehouse
total 0
drwxr-xr-x  12 srinivas  wheel  384 Nov 20 02:46 hbasetable
/tmp/spark > ls -ltr spark-warehouse/hbasetable
total 40
-rwxr-xr-x  1 srinivas  wheel  196 Nov 20 02:46 part-00002-5a3504cd-71c1-46fa-833f-76bf9178e46f-c000
-rwxr-xr-x  1 srinivas  wheel  196 Nov 20 02:46 part-00001-5a3504cd-71c1-46fa-833f-76bf9178e46f-c000
-rwxr-xr-x  1 srinivas  wheel  196 Nov 20 02:46 part-00003-5a3504cd-71c1-46fa-833f-76bf9178e46f-c000
-rwxr-xr-x  1 srinivas  wheel  196 Nov 20 02:46 part-00000-5a3504cd-71c1-46fa-833f-76bf9178e46f-c000
-rwxr-xr-x  1 srinivas  wheel  196 Nov 20 02:46 part-00004-5a3504cd-71c1-46fa-833f-76bf9178e46f-c000

Note - From Next time onwards if you want to do any testing on your hbase data you have to open your spark-shell from /tmp/spark same directory where you have created table , It will not work if you open spark-shell in different directory and access HbaseTable table.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM