简体   繁体   English

使用 AWS Glue 中的动态框架更新 RDS 表

[英]RDS Table update using Dynamic Frame in AWS Glue

I have a glue job in that I am inserting data from csv to postgresql table.我有一个粘合工作,因为我将数据从 csv 插入到 postgresql 表。 Now I need to update one row in the postgresql DB table.现在我需要更新 postgresql 数据库表中的一行。 I have done this but the desired row is coming as a new entry in the table, not updating the existing one.我已经这样做了,但是所需的行作为表中的一个新条目出现,而不是更新现有的。 How can I do this?我怎样才能做到这一点? Please help...请帮忙...

Glue currently does not support overwrite mode. Glue 目前不支持覆盖模式。 You would need to convert your DynamicFrame to a DataFrame and then write with mode = overwrite like this:您需要将DynamicFrame转换为DataFrame ,然后使用mode = overwrite写入,如下所示:

dynamic_frame.toDF()
  .write
  .mode("overwrite")

Spark does not support updating of records yet. Spark 尚不支持更新记录。 You can only overwrite(delete the existing records and add new) or append(add new records along with existing ones).您只能覆盖(删除现有记录并添加新记录)或追加(与现有记录一起添加新记录)。 However if you want to update a particular row you can use python library pg8000.但是,如果您想更新特定行,您可以使用 python 库 pg8000。 Steps are as below for glue version 2.0 and python version 3:胶水版本 2.0 和 python 版本 3 的步骤如下:

  1. Download and extract pg8000,asn1crypto and scramp tar files and then zip all of them into one.下载并解压 pg8000、asn1crypto 和 scramp tar 文件,然后将它们全部压缩为一个。

  2. Upload the file to s3 bucket将文件上传到 s3 存储桶

  3. In python library path, add the path of the zipped pg8000 file.在python库路径中,添加压缩后的pg8000文件的路径。 Eg: s3://bucketname/foldername/pg8000-1.19.2.zip例如:s3://bucketname/foldername/pg8000-1.19.2.zip

  4. import only pg8000.native and use below code to make connection to the database directly.仅导入 pg8000.native 并使用以下代码直接连接到数据库。

    import ssl导入 ssl

    import pg8000.native导入 pg8000.native

    conn = pg8000.native.Connection(database="database", host="xxxxxxrds.amazonaws.com", port=xxxx, user="user", password="password",ssl_context=ssl._create_unverified_context()) conn = pg8000.native.Connection(database="database", host="xxxxxxrds.amazonaws.com", port=xxxx, user="user", password="password",ssl_context=ssl._create_unverified_context())

    insert_query = "your update query that you would generally write in postgresql" insert_query = "您通常会在 postgresql 中编写的更新查询"

    conn.run(insert_query) conn.run(insert_query)

    conn.run("COMMIT") conn.run(“提交”)

    conn.close() conn.close()

PS: You can also use python '.format' to make your query generic. PS:您还可以使用 python '.format' 使您的查询通用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM