如何从火花中的aws胶水作业中的redshift读取数据？

Question

我是 aws 胶水的新手，需要社区的帮助。 我在 redshift 中有表，并想在 spark 的胶水作业中迭代使用 select 查询返回的数据集。 我已经编写了下面的代码，但无法弄清楚如何从 dataframe 中检索所有行和列值？ 另外，我收到错误...

IllegalArgumentException: requirement failed: The number of columns doesn't match.

这是我的示例代码-

from pyspark.context import SparkContext 
from pyspark.sql import SQLContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.context import GlueContext

sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session

t = glueContext.read.format("jdbc").option("url","myurl").option("user","myuser").option("password",mypwd).option("dbtable","(select id, years, months, days from etl.mytable where id=1) as t1").load()
print(t)
t= t.toDF()
for row in t:
  print(row)

Answer 1

问题是，您在 t 上调用.toDF() ，实际上它已经是DataFrame ，而不是DynamicFrame 。

如何从火花中的aws胶水作业中的redshift读取数据？

问题描述

1 个解决方案

解决方案1
0 2022-02-01 21:08:45

如何从火花中的aws胶水作业中的redshift读取数据？

问题描述

1 个解决方案

解决方案1 0 2022-02-01 21:08:45

解决方案1
0 2022-02-01 21:08:45