[英]Test Spark Sql Queries Local
Recently I am working in a Spark Application and as part of project the dataset is read from HBase Server and Spark sql modifies the data read and saves to Kafka.最近我在一个 Spark 应用程序中工作,作为项目的一部分,从 HBase 服务器读取数据集,Spark sql 修改读取的数据并保存到 Kafka。
The problem I am facing is I can't test spark.sql locally.我面临的问题是我无法在本地测试 spark.sql。 Every time I have to submit the application jar and run in Server.
每次我必须提交应用程序jar并在服务器中运行。 In Sql we have tools to test all the queries in local environment.
在 Sql 中,我们有工具来测试本地环境中的所有查询。
Is there a way or other tools where I can test spark sql locally by reading data from HBase?有没有办法或其他工具可以通过从 HBase 读取数据来在本地测试 spark sql?
I tried hbaseExplorer but it does not solve the problem.我试过 hbaseExplorer 但它没有解决问题。
Thanks,谢谢,
If you are talking about unit testing your Spark SQL queries.如果您正在谈论对 Spark SQL 查询进行单元测试。 You can always create DataSet locally and run queries against them
您始终可以在本地创建 DataSet 并针对它们运行查询
scala> val df = List(( 1 , false , 1.0 ),
| (2 , true , 2.0 )
| ).toDF("col1", "col2","col3" )
df: org.apache.spark.sql.DataFrame = [col1: int, col2: boolean ... 1 more field]
scala> df.registerTempTable("myTable")
scala> sql("select sum(col3) from myTable").show
+---------+
|sum(col3)|
+---------+
| 3.0|
+---------+
Using Apache Phoenix
使用
Apache Phoenix
If you have access to Apache Phoenix
, Open spark-shell in your local and connect to Apache Phoenix
using JDBC connection details.如果您有权访问
Apache Phoenix
,请在本地打开 spark-shell 并使用 JDBC 连接详细信息连接到Apache Phoenix
。
Using Direct Connection to HBase
You can also connect HBase
directly from your local spark-shell
, It is somewhat difficult if your cluster is secured or kerbrose enabled.使用
Direct Connection to HBase
您也可以直接从本地spark-shell
连接HBase
,如果您的集群是安全的或启用了 kerbrose,这有点困难。
Using Export Sample Data
(easy way & will save lot of time also.)使用
Export Sample Data
(简单的方法,也会节省大量时间。)
For testing purpose,出于测试目的,
json
or csv
or any other formats you likejson
或csv
或您喜欢的任何其他格式spark.sql('CREATE TABLE HbaseTable ..')
spark.sql('CREATE TABLE HbaseTable ..')
DataFrame
DataFrame
DataFrame
data to newly created table.DataFrame
数据写入新创建的表。 Check below steps for your reference.检查以下步骤供您参考。
/tmp/spark > ls -ltr
total 0
drwxr-xr-x 14 srinivas wheel 448 Nov 20 02:45 data
/tmp/spark > ls -ltr data
total 40
-rw-r--r-- 1 srinivas wheel 9 Nov 20 02:45 part-00000-4f5f5245-f664-426b-8204-a981871a1205-c000.json
-rw-r--r-- 1 srinivas wheel 9 Nov 20 02:45 part-00004-4f5f5245-f664-426b-8204-a981871a1205-c000.json
-rw-r--r-- 1 srinivas wheel 9 Nov 20 02:45 part-00002-4f5f5245-f664-426b-8204-a981871a1205-c000.json
-rw-r--r-- 1 srinivas wheel 9 Nov 20 02:45 part-00003-4f5f5245-f664-426b-8204-a981871a1205-c000.json
-rw-r--r-- 1 srinivas wheel 9 Nov 20 02:45 part-00001-4f5f5245-f664-426b-8204-a981871a1205-c000.json
Open spark-shell
in path /tmp/spark
在路径
/tmp/spark
打开spark-shell
/tmp/spark > spark-shell
scala> val df = spark.read.json("/tmp/spark/data")
df: org.apache.spark.sql.DataFrame = [id: bigint]
scala> spark.sql("create table HBaseTable(id int) stored as orc")
res0: org.apache.spark.sql.DataFrame = []
scala> df.write.insertInto("HbaseTable")
scala> spark.sql("select * from HbaseTable").show(false)
+---+
|id |
+---+
|4 |
|3 |
|1 |
|5 |
|2 |
+---+
scala> :q
/tmp/spark > ls -ltr
total 8
drwxr-xr-x 14 srinivas wheel 448 Nov 20 02:45 data
-rw-r--r-- 1 srinivas wheel 700 Nov 20 02:45 derby.log
drwxr-xr-x 9 srinivas wheel 288 Nov 20 02:45 metastore_db
drwxr-xr-x 3 srinivas wheel 96 Nov 20 02:46 spark-warehouse
/tmp/spark > ls -ltr spark-warehouse
total 0
drwxr-xr-x 12 srinivas wheel 384 Nov 20 02:46 hbasetable
/tmp/spark > ls -ltr spark-warehouse/hbasetable
total 40
-rwxr-xr-x 1 srinivas wheel 196 Nov 20 02:46 part-00002-5a3504cd-71c1-46fa-833f-76bf9178e46f-c000
-rwxr-xr-x 1 srinivas wheel 196 Nov 20 02:46 part-00001-5a3504cd-71c1-46fa-833f-76bf9178e46f-c000
-rwxr-xr-x 1 srinivas wheel 196 Nov 20 02:46 part-00003-5a3504cd-71c1-46fa-833f-76bf9178e46f-c000
-rwxr-xr-x 1 srinivas wheel 196 Nov 20 02:46 part-00000-5a3504cd-71c1-46fa-833f-76bf9178e46f-c000
-rwxr-xr-x 1 srinivas wheel 196 Nov 20 02:46 part-00004-5a3504cd-71c1-46fa-833f-76bf9178e46f-c000
Note - From Next time onwards if you want to do any testing on your hbase data you have to open your spark-shell from /tmp/spark
same directory where you have created table
, It will not work if you open spark-shell in different directory and access HbaseTable
table.注意- 从下一次开始,如果您想对您的 hbase 数据进行任何测试,您必须从
/tmp/spark
与您创建table
同一目录中打开您的 spark-shell,如果您在不同的位置打开 spark-shell,它将无法正常工作目录并访问HbaseTable
表。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.