简体   繁体   中英

Unable to parse file from AWS Glue dynamic_frame to Pyspark Data frame

Iam new to AWs glue.

I am facing issue in converting glue data frame to pyspark data frame:

Below is the crawler configuration i created for reading csv file glue_cityMapDB="csvDb" glue_cityMapTbl="csv table"

datasource2 = glue_context.create_dynamic_frame.from_catalog(database = glue_cityMapDB, table_name = glue_cityMapTbl, transformation_ctx = "datasource2")

datasource2.show()

print("Show the data source2 city DF")
cityDF=datasource2.toDF()
cityDF.show()

Output:

Here i am getting output from the glue dydf - #datasource2.show() But after converting to the pyspark DF, iam getting following error

S3NativeFileSystem (S3NativeFileSystem.java:open(1208)) - Opening 's3://s3source/read/names.csv' for reading 2020-04-24 05:08:39,789 ERROR [Executor task launch worker for task

Appreciate if anybody can help on this?

Make use of a file are of UTF-8 encoded. You can check using file or convert using inconv or any other text editor like sublime.

You can also read the files as a dataframe using:

df = spark.read.csv('s3://s3source/read/names.csv')

then convert to dynamic frames using fromDF()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM