简体繁体中英

Connecting to Spark SQL on EMR using JDBC

原文 2016-11-18 18:05:49 2 1 amazon-web-services/ jdbc/ pyspark/ apache-spark-sql/ emr

I have spark running on EMR and i have been trying to connect to spark-SQL from SQLWorkbench using the JDBC hive drivers, but in vain. I have started the thrift server on the EMR and i'm able to connect to Hive on port 10000(default) from Tableau/SQL Workbench. When i try to run a query, it fires a Tez/Hive job. However, i want to run the query using Spark. Within the EMR box, I'm able to connect to SparkSQL using beeline and run a query as a spark job. Resource manager shows that the beeline query is running as a spark job, while the query running through SQLWorkbench, is running a hive/Tez job.

When i checked the logs, i found that the thrift server to connect to spark was running on port 10001(default). When i fire up beeline, the entries come up for connection and sql that i'm running. However, when the same connection parameters are used to connect form SQLWorkbench/Tableau, it has an exception without much details. the exception just say connection ended.

I tried running on a custom port by passing the parameters, beeline works, but not through jdbc connection.

Any help to resolve this issue?

1 answers

I was able to resolve the issue. I was able to connect to SparkSQL from Tableau and the reason i was not able to connect was we were bringing up the thrift service as root. Not sure why it would matter, i had to change the permission on the log folder to the current user(not root) and bring up the thrift service, which enabled me to connect without any issues.

Using Postgresql JDBC source with Apache Spark on EMR

Using Jupyter notebook on Spark on EMR

Can't access JDBC driver through spark with AWS EMR instance

How to configure Java client connecting to AWS EMR spark cluster

Running an EMR Spark script, and the Spark UI SQL tab disappears

Write to a file in S3 using Spark on EMR

How to install Spark, Hadoop on EMR using Terraform?

Spark SQL error from EMR notebook with AWS Glue table partition

AWS EMR Spark error with `Failed to load class of driverClassName com.mysql.jdbc.Driver`

Pass comma separated argument to spark jar in AWS EMR using CLI

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Using Postgresql JDBC source with Apache Spark on EMR Using Jupyter notebook on Spark on EMR Can't access JDBC driver through spark with AWS EMR instance How to configure Java client connecting to AWS EMR spark cluster Running an EMR Spark script, and the Spark UI SQL tab disappears Write to a file in S3 using Spark on EMR How to install Spark, Hadoop on EMR using Terraform? Spark SQL error from EMR notebook with AWS Glue table partition AWS EMR Spark error with `Failed to load class of driverClassName com.mysql.jdbc.Driver` Pass comma separated argument to spark jar in AWS EMR using CLI

Related Tags

Connecting to Spark SQL on EMR using JDBC

Question

1 answers

solution1 0 2016-11-22 21:24:21

solution1
0 2016-11-22 21:24:21