[英]write a spark dataframe or write a glue dynamic frame, which option is better in AWS Glue?
In AWS Glue, I read the data from data catalog in a glue dynamic frame.在 AWS Glue 中,我从粘合动态框架中的数据目录中读取数据。 Then convert the dynamic frame to spark dataframe to apply schema transformations.
然后将动态帧转换为 spark dataframe 以应用模式转换。 To write the data back to s3 I have seen developers convert the dataframe back to dynamicframe.
要将数据写回 s3,我看到开发人员将 dataframe 转换回动态帧。 Is there any advantage over writing a glue dynamic frame to writing a spark dataframe?
比写胶水动态帧写火花dataframe有什么优势吗?
You will find that there is functionality that is available only to dynamic frame writer class that cannot be accessed when using data frames:您会发现只有动态帧写入器 class 可用的功能在使用数据帧时无法访问:
from_jdbc_conf
from_jdbc_conf
glueparquet
as a format.glueparquet
作为格式写入镶木地板。These are some of the use-cases I can think of, but if you have a use case that requires using save modes, for example, mode('overwrite')
you could use data frames.这些是我能想到的一些用例,但如果你有一个需要使用保存模式的用例,例如
mode('overwrite')
,你可以使用数据帧。 A similar approach however exists at dynamic frame but is implemented slightly different.然而,类似的方法存在于动态框架中,但实现方式略有不同。 You can take a look at
[purge_s3_path][3]
then write.你可以看看
[purge_s3_path][3]
然后写。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.