简体   繁体   中英

Kinesis vs KPL vs KCL

This is somewhat of a shallow-level question. However, I perplexed by this trio of services.

I understand that KPL produces fast data and KCL consumes fast data produced by Kinesis. However, what I fail to understand is the if KPL and KCL make up this pair, what do we need AWS Kinesis for?

Another way to look at it: If AWS Kinesis can produce the fast data and KCL can consume it, then what we need KPL for?

Any clarifying answer is greatly appreciated.

The Kinesis Producer Library (KPL) aggregates small user-formatted records into larger records up to 1 MB to make better use of Amazon Kinesis Data Streams throughput.

While the KCL for Java supports deaggregating these records.

Refer this for more: https://docs.aws.amazon.com/streams/latest/dev/shared-throughput-kcl-consumers.html

One problem, the KCL and KPL are heavily focused on Java, but most of the data scientists love Python. One can always create amazon-kinesis-client-python library on top of Java MultiLangDaemon for interprocess communication, but it is not recommended.

AWS Kinesis is a very broad platform. Roughly, you can think of AWS Kinesis as: Kinesis Data Streams + Kinesis Video Streams + Kinesis Firehose + Kinesis Analytics. (Each one has its own purpose).

More detail here: 
 https://aws.amazon.com/kinesis/

Now, lets take Kinesis Data Streams , for example: What if you are a developer and you need feed data to a specific Kinesis Data Stream programatically (ie SDK)? This is where KPL comes into play. You use KPL to feed data to THAT stream.

Similar Story with KCL:

If you are a developer and you want get data ("consume") from that DATA STREAM, you use KCL.

In short: AWS Kinesis is huge platform, where KCL and KPL serve specific purposes.

TL;DR :

  • If you don't use Java you'll probably want to avoid the KPL / KCL .
  • KPL
    • Aggregates records to minimize the number of writes you perform
    • Retries on failure
    • Doesn't block your application
    • Is a library , not an AWS service
    • Is Optional : You don't have to use it
    • It looks like you have to use Java
  • KCL
  • Kinesis Stream (multiple types are available)
    • The actual conduit through which your data travels
    • An AWS service

Details :

  • Let's assume you have a Kinesis Data Stream.
    This is the actual service ran by AWS.
  • You want to put things into the stream on one end and pick them up on the other.
  • Kinesis data streams are composed of shards.
  • Each Kinesis Data Shard allows up to 1MB/sec and 1000 records/second of write capacity.
  • You could get throttled if you go over either of those limits.
  • You want to make the most efficient use of your stream.
  • You could aggregate many records together until you get to 1MB and then send it over the wire using the Kinesis API, efficiently consuming 1 write. Then on the other side, you need to unwrap that record to treat each item individually so you can pretend they were sent over individually.
  • If you run into throttles, you'll want to retry it again later.
  • You could implement all of that logic on your own or just use the KPL and KCL if you use Java or if you're brave enough to figure out how to use it with other languages.
  • The KPL and KCL are libraries, not services.
  • Think of KPL as a library that efficiently packs/aggregates the records and puts them into your stream.
  • Think of KCL as a library that helps you unwrap the efficient packing of KPL for you.

There's obviously more to it, but they appear to go well together if you happen to use Java. And, if you don't use KPL / KCL , you'll probably want to implement something that looks like it.

Based on my research, it looks like you have to use Java if you want to use KPL and you can use other languages with the KCL, but it looks complicated and you may have to give up some of the features that drew you to the KPL/KCL in the first place (like aggregation).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM