簡體   English   中英

AWS Glue 轉換

[英]AWS Glue transform

嘗試從 s3 存儲桶讀取 Input.csv 文件,獲取不同的值(並進行一些其他轉換),然后寫入 target.csv 文件,但在嘗試將數據寫入 8 個存儲桶中的 Target.Z628CB5675FF524F3E719B3AAF 時遇到問題。

下面是代碼:

import sys
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.dynamicframe import DynamicFrame
from awsglue.job import Job

glueContext = GlueContext(SparkContext.getOrCreate())

dfnew = glueContext.create_dynamic_frame_from_options("s3", {'paths': ["s3://bucket_name/Input.csv"] }, format="csv" )

dfMod = dfnew.select_fields(["Col2","Col3"]).toDF().distinct()

dnFrame  = DynamicFrame.fromDF(dfMod, glueContext, "test_nest")

datasink = glueContext.write_dynamic_frame.from_options(frame = dnFrame, connection_type = "s3",connection_options = {"path": "s3://bucket_name/Target.csv"}, format = "csv", transformation_ctx ="datasink") 

這是 Input.csv 中的數據:

Col1    Col2    Col3
1       1       -30.4
2       2       -30.5
3       3        6.70
4       4        5.89
5       4        6.89
6       4        6.70
7       4        5.89
8       4        5.89

錯誤:

val dfmod = dfnew.select_fields(["Col2","Col3"]).toDF().distinct().show() ^ SyntaxError: invalid syntax During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/opt/amazon/bin/runscript.py", line 92, in <module>
while "runpy.py" in new_stack.tb_frame.f_code.co_filename: AttributeError: 'NoneType' object has no attribute 'tb_frame'

我確實理解這是因為我使用的是 create_dynamic_frame_from_options 而不是from_catalog但是如何在使用 from_options 時獲得所需的功能(因為我的格式是 s3 中的 csv )?

IAM(膠水服務政策):

{
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Allow",
        "Action": [
            "s3:GetObject",
            "s3:PutObject"
        ],
        "Resource": [
            "arn:aws:s3:::bucket_Name/Output/**/**/*"
        ]
    }
    ]
}

S3 存儲桶策略:

{
"Version": "2012-10-17",
"Id": "Policy***",
"Statement": [
    {
        "Sid": "Stmt1***",
        "Effect": "Allow",
        "Principal": {
            "AWS": "arn:aws:iam::account_number:root"
        },
        "Action": "s3:*",
        "Resource": "arn:aws:s3:::bucket_name"
    }
    ]
}

請幫助

在線語法錯誤

val dfMod = dfnew.select_fields(["Col2","Col3"]).toDF().distinct().show()

可以糾正如下,我們不需要valshow()它只會返回一個 dataframe 我們在傳遞給write_dynamic_frame之前將其轉換為 DynamicFrame 還需要在頂部添加一個 import 語句from awsglue.dynamicframe import DynamicFrame

dfMod = dfnew.select_fields("Col2","Col3").toDF().distinct()
dnFrame  = DynamicFrame.fromDF(dfMod, glueContext, "test_nest")

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM