[英]Unable to connect to MS SQL from Apache Spark using pyspark on Jupyter notebook
[英]How to connect to Greenplum Database remotely from PySpark in Jupyter Notebook?
我正在嘗試通過JDBC連接將PySpark(與Jupyter Notebook)連接到Oracle VM VirtualBox上的Greenplum數據庫實例,但是當我知道密碼正確時,我收到以下錯誤:
Py4JJavaError: An error occurred while calling o424.load.
: org.postgresql.util.PSQLException: FATAL: password authentication failed
for user "user2"
我試過了:
查看有關與PySpark連接的Greenplum DB文檔
更改gp_hba.conf,sshd_conf和postgresql.conf文件中的Postgresql連接設置
使用pyspark shell並加載.jar文件
pyspark --jars 'path to .jar file'
然后運行下面提到的代碼
Jupyter Notebook中的PySpark代碼是:
import findspark
findspark.init('spark-2.4.1-bin-hadoop2.7')
from pyspark.sql import SparkSession
spark = SparkSession.builder.getOrCreate()
option = {
'url':"jdbc:postgresql://localhost:5432/tutorial",
'user':"user2",
'password':"SECRET",
'dbschema':"faa",
'dbtable':"otp_c",
'partitionColumn':"airlineid"
}
gpdf = spark.read.format('greenplum').options(**option).load()
Pivotal Greenplum指示有一個連接器.jar文件,用於JDBC連接到數據庫,我位於spark-2.4.1-bin-hadoop2.7 / jars / greenplum-spark_2.11-1.6.0.jar
此外,在Greenplum DB中,gp_hba.conf配置為:
# If you want to allow non-local connections, you need to add more
# "host" records. In that case you will also need to make PostgreSQL
# listen on a non-local interface via the listen_addresses
# configuration parameter, or via the -i or -h command line switches.
# CAUTION: Configuring the system for local "trust" authentication allows
# any local user to connect as any PostgreSQL user, including the database
# superuser. If you do not trust all your local users, use another
# authentication method.
# TYPE DATABASE USER CIDR-ADDRESS METHOD
# "local" is for Unix domain socket connections only
# IPv4 local connections:
# IPv6 local connections:
local all gpadmin ident
host all gpadmin 127.0.0.1/28 trust
host all gpadmin 10.0.2.15/32 trust
host all gpadmin ::1/128 trust
host all gpadmin fe80::a00:27ff:fe84:1f3f/128 trust
local replication gpadmin ident
host replication gpadmin samenet trust
local gpperfmon gpmon md5
host all gpmon 127.0.0.1/28 md5
local tutorial +users trust
host tutorial +users trust
host all all 0.0.0.0/0 md5
#local all all md5
#local all user2 ident
sshd_config文件配置為
PasswordAuthentication yes
最后,配置了postgresql.conf文件
# - Connection Settings -
listen_addresses = '*' # what IP address(es) to listen on;
# comma-separated list of addresses;
# defaults to '*', '*' = all
# (change requires restart)
port=5432 ##port = 5432 # sets the database
listener port for
# a Greenplum instance. The master and
# each segment has its own port
number.
# note: Port numbers for the Greenplum system must also be changed in the
# gp_configuration catalog. See the Greenplum Database Administrator Guide
# for instructions!
#
#
我期望連接到Greenplum DB並使用PySpark執行SQL查詢,但是我收到了Py4JJavaError。
不確定存在哪些其他選項,理想情況下我想通過Jupyter筆記本連接請幫忙!
在pg_hba中,主機配置需要CIDR。 這條線
主持教程+用戶信任
不會生效。 所以它落在最后一行並要求輸入密碼。
您可以在greenplum群集內創建一個帶有密碼的角色user2。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.