简体   繁体   English

使用 pyspark 连接到 Presto SQL 目录并在 postgresql 数据库上执行查询的 Pyspark 代码或步骤?

[英]Pyspark code or steps to connect to Presto SQL catalog and execute query on postgresql db using pyspark?

I have pyspark configured to work with PostgreSQL directly.我已将 pyspark 配置为直接使用 PostgreSQL。 However, I want to pass data from spark to presto using jdbc connector, and then run the query on postgresql using pyspark and presto.但是,我想使用 jdbc 连接器将数据从 spark 传递到 presto,然后使用 pyspark 和 presto 在 postgresql 上运行查询。 How can I do that code-wise?我怎样才能在代码方面做到这一点?

from pyspark.sql import SparkSession
from pyspark import SparkContext,SparkConf
from pyspark.sql import SQLContext
import sys
sys.path.append('/usr/local/lib/python3.6/dist-packages')
import requests
import json, ast

sc = SparkContext.getOrCreate()
sqlContext = SQLContext(sc)
spark = SparkSession.builder \
.master("local") \
.appName("jdbc data sources") \
.config("spark.sql.shuffle.partitions", "4") \
.getOrCreate()
driver = "io.prestosql.jdbc.PrestoDriver" 
#path = "//host:port/prestosql/?user=<username>&password=<passwd>"
path = "//host:port/prestosql<catalog>"
url = "jdbc:presto:" + path
tablename = <tablename>
dbDataFrame = spark.read.format("jdbc").option("url", url).option("dbtable", "<select query>").option("driver", driver).load()

What am I doing wrong?我究竟做错了什么? I want to run a select query on postgresql via presto and pass the result back to spark using pyspark.我想通过 presto 在 postgresql 上运行一个选择查询,并使用 pyspark 将结果传递回 spark。

I am getting following error :我收到以下错误:

in get_return_value py4j.protocol.Py4JJavaError: An error occurred while 
calling o53.load. : java.sql.SQLException: Authentication using 
username/password requires SSL to be enabled at 
io.prestosql.jdbc.PrestoDriverUri.setupClient(PrestoDriverUri.java:160) at 
io.prestosql.jdbc.PrestoDriver.connect(PrestoDriver.java:91) at  org.apache.spark.sql.execution.datasources.jdbc.DriverWrapper.connect(DriverWrapper.scala:45)

When i enable .option("SSL","true"), I am getting new error :当我启用 .option("SSL","true") 时,出现新错误:

py4j.protocol.Py4JJavaError: An error occurred while calling o84.load. : 
java.sql.SQLException: Error executing query at 
io.prestosql.jdbc.PrestoStatement.internalExecute(PrestoStatement.jav a:284) 
at io.prestosql.jdbc.PrestoStatement.execute(PrestoStatement.java:229) at 
io.prestosql.jdbc.PrestoPreparedStatement.<init>(PrestoPreparedStatem 
ent.java:80

What am i doing wrong.. pls help我做错了什么..请帮忙

我猜您的 sql 查询中可能存在错误。首选语法类似于.option("dbtable","(select * from sample_table)a").load()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM