简体   繁体   中英

How can I preview AWS Glue jobs?

I´m want to use Glue to extract data from an RDS PostgresDB, transform/clean it and load into an S3 Bucket so I can use Athena and Quicksight to visualize the data and create reports.

I´m currently authoring the Glue job for the data cleanup (remove NULL values and such things). But I can see no easy way to preview the job script results. I can only see the results in the S3 bucket after running the complete job. And running the job takes at least 10 minutes to start, and a few more to finish. So I have a roundtrip time of about 15 minutes to see if my code is correct. Is this supposed to be the workflow here? Am I missing anything?

I´m new to the whole BI/data stuff. Maybe I´m following the wrong approach. I want to visualize data from RDS in Quicksight and need to do some data cleanup first. Any other approaches that make sense for this scenario? (we are talking about a small dataset of about a few 100MBs)

Thanks!

Look into notebooks. You can set them up in the AWS Glue Console. They give you an interactive way of writing your code before you put the script into a Glue Script. No big difference between Sagemaker (Juypter) and Zeppelin notebooks for standard cases, guess its down to our taste.

In general, especially with small datasets, a local development environment might work out for you as well and gives you even more freedom. For larger datasets a common practise is to get a sample of only a few hundred records so it can be processed instantaneous. Helps a lot during development.

And last: Not sure why to go away from Postgres. What kind of analysis do you want to do you can't do in the Relational world? Also, why don't do the clean-up in the DB?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM