简体   繁体   中英

How Glue crawler load data in Redshift table?

I am a new AWS user and got confused about its services. In our company, we stored our data in S3 therefore I created a bucket in s3 and created an AWS Glue crawler to load this table to the Redshift table (what we normally do in our company), which I successfully can see on Redshift.

Based on my research the Glue crawler should create metadata related to my data in the Glue data catalog which again I am able to see. Here is my question: How my crawler works and does it load S3 data to Redshift? Should my company have a special configuration that lets me load data to Redshift?

Thanks

AWS Glue does not natively interact with Amazon Redshift.

Load data from Amazon S3 to Amazon Redshift using AWS Glue - AWS Prescriptive Guidance provides an example of using AWS Glue to load data into Redshift, but it simply connects to it like a generic JDBC database.

It appears that you can Query external data using Amazon Redshift Spectrum - Amazon Redshift , but this is Redshift using the AWS Glue Data Catalog to access data stored in Amazon S3. The data is not "loaded" into Redshift. Rather, the External Table definition in Redshift tells it how to access the data directly in S3. This is very similar to Amazon Athena, which queries data stored in S3 without having to load it into a database. (Think of Redshift Spectrum as being Amazon Athena inside Amazon Redshift.)

So, there are basically two ways to query data using Amazon Redshift:

  1. Use the COPY command to load the data from S3 into Redshift and then query it, OR
  2. Keep the data in S3, use CREATE EXTERNAL TABLE to tell Redshift where to find it (or use an existing definition in the AWS Glue Data Catalog), then query it without loading the data into Redshift itself.

I figured out what I meant by seeing the tables in Redshift after running crawler. In fact, I created an external table in Redshift not store the table to Redshift.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM