简体   繁体   中英

Put data from AWS Kinesis into different buckets based on data type

I've followed the setup described in this tutorial to configure a data pipeline from Aurora all the way to redshift. I've got this working perfectly for one table eg Sales.

However now I want to expand things so that I can bring in data from other tables as well eg Products and Categories such that each data type will end up in a separate table in Redshift ie Redshift should have a Sales table and a Product table in addition to a Categories table.

How do I do this with Kinesis/S3/Redshift setup?

Redshift is able to bring data in from one S3 location only. Similarly Kinesis can be configured to put data into one S3 location only. I'm trying to find a way to take my records from kinesis based on data type such that they go into different S3 locations so I can pull them into separate Redshift tables.

The obvious solution is to have more than one stream each one corresponding to a data type but I think this will be expensive. What options are there to do this?

Good news. in Kinesis Data Firehose you pay only for the amount of data your pipeline is processing, plus the data conversions (if applicable). So you can have two separate streams and it shouldn't be more expensive than a single one.

Regarding Redshift Spectrum, you can actually bring data from as many locations as you need. If you look at the post you were linking, there is a create table statement like this

    CREATE EXTERNAL TABLE IF NOT EXISTS spectrum_schema.ecommerce_sales(
  ItemID int,
  Category varchar,
  Price DOUBLE PRECISION,
  Quantity int,
  OrderDate TIMESTAMP,
  DestinationState varchar,
  ShippingType varchar,
  Referral varchar)
ROW FORMAT DELIMITED
      FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
LOCATION 's3://{BUCKET_NAME}/CDC/'

On that statement, the last line references the location of the S3 files to include in the table. You would configure several streams, one per table/S3 location, but you can use a single Redshift cluster to query all your tables.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM