![](/img/trans.png)
[英]How to convert JSON to CSV file from s3 and save it in same s3 bucket using Glue job
[英]AWS Glue - Convert the Json response from GET(REST API) request to DataFrame/DyanamicFramce and store it in s3 bucket
headersAPI = {
'Content-Type': 'application/json'
, 'accept': 'application/json'
,'Authorization': 'Bearer XXXXXXXXXXXXXXXXXXXXXXXXXX',
}
skill_response=requests.get("XXXXXX",headers=headersAPI),headers=headersAPI)
log.info(skill_response.text)
skill_json=skill_response.json()
print(skill_json) ##print the json data and verified
log.info('skills data')
log.info(skill_json["status"])
DataSink0 = glueContext.write_dynamic_frame.from_options(frame =
skill_json, connection_type = "s3", format = "csv", connection_options=
{"path": "s3://xxxxx/", "partitionKeys": []}, transformation_ctx= "DataSink0")
job.commit()
類型錯誤:frame_or_dfc 必須是 DynamicFrame 或DynamicFrameCollection。 得到 <class 'dict'>
在寫入 S3 時,我收到此錯誤: 'dict' object has no attribute '_jdf'
通過首先從響應字符串創建 DataFrame(在此處討論),然后將此 DataFrame 轉換為 DynamicFrame,可以將 JSON 響應轉換為 DynamicFrame。
這個例子應該有效:
import requests
from awsglue.job import Job
from pyspark.context import SparkContext
from awsglue import DynamicFrame
from awsglue.context import GlueContext
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
r = requests.get(url='https://api.github.com/users?since=100')
df = spark.read.json(sc.parallelize([r.text]))
dynamic_frame = DynamicFrame.fromDF(
df, glue_ctx=glueContext, name="df"
)
#dynamic_frame.show()
DataSink0 = glueContext.write_dynamic_frame.from_options(
frame=dynamic_frame,
connection_type="s3", format="csv",
connection_options={"path": "s3://xxxxx/",
"partitionKeys": []},
transformation_ctx="DataSink0")
job.commit()
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.