简体   繁体   English

无法将文件从 AWS Glue 动态帧解析为 Pyspark 数据帧

[英]Unable to parse file from AWS Glue dynamic_frame to Pyspark Data frame

Iam new to AWs glue.我是 AWS 胶水的新手。

I am facing issue in converting glue data frame to pyspark data frame:我在将胶水数据帧转换为 pyspark 数据帧时遇到问题:

Below is the crawler configuration i created for reading csv file glue_cityMapDB="csvDb" glue_cityMapTbl="csv table"下面是我为读取 csv 文件而创建的爬虫配置

datasource2 = glue_context.create_dynamic_frame.from_catalog(database = glue_cityMapDB, table_name = glue_cityMapTbl, transformation_ctx = "datasource2")

datasource2.show()

print("Show the data source2 city DF")
cityDF=datasource2.toDF()
cityDF.show()

Output: Output:

Here i am getting output from the glue dydf - #datasource2.show() But after converting to the pyspark DF, iam getting following error在这里,我从胶水 dydf 获得 output - #datasource2.show() 但是在转换为 pyspark DF 后,我得到以下错误

S3NativeFileSystem (S3NativeFileSystem.java:open(1208)) - Opening 's3://s3source/read/names.csv' for reading 2020-04-24 05:08:39,789 ERROR [Executor task launch worker for task

Appreciate if anybody can help on this?感谢是否有人可以提供帮助?

Make use of a file are of UTF-8 encoded.使用的文件是UTF-8编码的。 You can check using file or convert using inconv or any other text editor like sublime.您可以使用文件检查或使用 inconv 或任何其他文本编辑器(如 sublime)进行转换。

You can also read the files as a dataframe using:您还可以使用以下命令将文件作为 dataframe 读取:

df = spark.read.csv('s3://s3source/read/names.csv')

then convert to dynamic frames using fromDF()然后使用 fromDF() 转换为动态帧

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM