简体   繁体   English

如何从 AWS Glue (PySpark) 连接到 Redshift?

[英]How to connect to Redshift from AWS Glue (PySpark)?

I am trying to connect to Redshift and run simple queries from a Glue DevEndpoint (that is requirement) but can not seems to connect.我正在尝试连接到 Redshift 并从 Glue DevEndpoint(这是必需的)运行简单查询,但似乎无法连接。

Following code just times out:以下代码只是超时:

df = spark.read \
  .format('jdbc') \
  .option("url", "jdbc:redshift://my-redshift-cluster.c512345.us-east-2.redshift.amazonaws.com:5439/dev?user=myuser&password=mypass") \
  .option("query", "select distinct(tablename) from pg_table_def where schemaname = 'public'; ") \
  .option("tempdir", "s3n://test") \
  .option("aws_iam_role", "arn:aws:iam::147912345678:role/my-glue-redshift-role") \
  .load()

What could be the reason?可能是什么原因?

I checked URL, user, password and also tried different IAM roles but every time just hangs..我检查了 URL、用户、密码,还尝试了不同的 IAM 角色,但每次都挂起..

Also tried without IAM role (just having URL, user/pass, schema/table that already exists there) and also hangs/timeout:还尝试了没有 IAM 角色(仅具有 URL、用户/密码、模式/表已经存在)并且还挂起/超时:

jdbcDF = spark.read \
    .format("jdbc") \
    .option("url", "jdbc:redshift://my-redshift-cluster.c512345.us-east-2.redshift.amazonaws.com:5439/dev") \
    .option("dbtable", "public.test") \
    .option("user", "myuser") \
    .option("password", "mypass") \
    .load()

Reading data (directly in Glue SSH terminal) from S3 or from Glue tables (catalog) seems fine so I know that Spark and Dataframes are fine, just there is something with connection to RedShift but not sure what?从 S3 或 Glue 表(目录)读取数据(直接在 Glue SSH 终端中)似乎很好,所以我知道 Spark 和 Dataframes 很好,只是与 RedShift 有一些连接但不确定是什么?

创建胶水作业

Select last option while creating glue job. Select 创建胶水作业时的最后一个选项。 And in next screen, it will ask to select Glue connection在下一个屏幕中,它会询问 select Glue connection

You seem to be on the correct path.你似乎走在正确的道路上。 I connect and query Redshift from Glue PySpark job the same way except a minor change of using我以相同的方式从 Glue PySpark 作业连接和查询 Redshift,除了使用的微小变化

.format("com.databricks.spark.redshift") 

I have also successfully used我也成功使用了

.option("forward_spark_s3_credentials", "true")

instead of代替

.option("iam_role", "my_iam_role")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM