Tag[aws-glue] Recent Newest Questions

Unable to save partitioned data in in iceberg format when using s3 and glue

Getting the following error- This is the query i am running on spark 3.3, with glue catalog and saving to s3. The iceberg version is 1.1.0 - But ...

how to Read .Sql file stored in S3 containing multiple SQL statements

I have a .sql file stored in S3 location in AWS which contains multiple SQL statements separated by semi colon as below: tried using 2 methods in A ...

AWS Glue Crawler and JDCBConnection : "Expected string length >= 1, but found 0 for params.Targets.JdbcTargets[0].customJdbcDriverClassName"

I am trying to setup an AWS Glue Crawler using a JDBC connection in order to populate my AWS Glue Data Catalog databases. I already have a Connection ...

How to deal with failing Athena queries as AWS Glue datacatalog metada size grows large?

Based on my research, the easiest and the most straight forward way to get metadata out of Glue's Data Catalog, is using Athena and querying the infor ...

Is there a way to extract the glue job id from the pyspark script

I am new to AWS glue and I am trying to process a CSV file in S3 that has already been cataloged by a crawler, rename the column names and add some ad ...

Unable to read json files in AWS Glue using Apache Spark

. Answers to this question are eligible for a +50 reputation bounty. Ru ...

Serverless Framework not getting localfiles

I got a problem regarding Serverless Framework, i want to create a glue job. But when creating resource i can only choose a s3 path. Why cant i choose ...

Glue not able to recognize Delta Lake Python Library

I am trying to use Delta Lake Python Library in my Glue job. However, my Glue job is not able to recognize it and I get the error "NameError: name 'De ...

AWS Glue ExecutorLostFailure (executor 15 exited caused by one of the running tasks) Reason: Remote RPC client disassociated

I have a simple glue job where I am using pyspark to read 14million rows from RDS using JDBC and then trying to save it into S3. I can see Output logs ...

How to cast a nested array in Glue job

I have this schema in the AWS Glue job: I can cast a string of FilteredOutDecisions.ApprovedAmount to double using resolveChoice() method: But I ...

Aws Glue Workflow triggering multiple times one job (incorrect behavior)

I have a big glue workflow (about 100 jobs / crawlers), and it was executing properly until last week. Since then, my first conditional trigger (ALL), ...

Datasketches .whl Linux ARM64 for AWS Glue Job

I have some trouble to install python datasketches==4.0.0 on Linux ARM64. I receive the following error when I run pip3 install datasketches==4.0.0: ...

How to read the input state in a Step Function from a Glue Python job?

I have a step function that generates the following input for the next step: where the fields in "input":[...] are the output of other steps. The ...

Update/Delete an Item in DyanamoDb using Glue Job

I am working on accessing the DynamoDB from the Glue Job using pyspark. Currently I am writing an entry in the Dynamo DB using the write_dynamic_frame ...

update schedule of a glue crawler on aws

I have created an aws crawler to update/sync data between s3 and athena tables using create_crawler. I have used the Schedule parameter to run it on a ...

Deleting records from Apache Hudi Table which is part of Glue Tables created using AWS Glue Job and Kinesis

I currently have a DynamoDB stream configured which is inputing streams into Kinesis Data streams whenever insertion/updation happens and subsequent ...

Best way to convert JSON to Apache Parquet format using aws

I've been working on a project where I've been storing the iot data in s3 bucket and batching them using aws kinesis firehose, i have a lambda functio ...

Move deeply nested fields one level up in pyspark dataframe

I have a pyspark dataframe created from XML. Because of the way XML is structured I have an extra, unnecessary level of nesting in the schema of the d ...

Error when I'm trying to connect to Athena from power BI

This error Show me: ODBC: ERROR [HY000] [Simba][Athena] (1041) An error has been thrown from the AWS Glue client. Athena Error No: 15, HTTP Response ...

Conversion of datetime format

I have column name requestdatetime with data type string. Value for requestdatetime is in format 15/Aug/2022:01:54:41 +0000 I need to convert 15/Aug/ ...