简体   繁体   English

是否可以在连接到API网关的Lambda上使用AWS KPL

[英]Is it possible to use AWS KPL on Lambda connected to API Gateway

I am trying to build a data collection pipe-line on top of AWS services. 我正在尝试在AWS服务之上构建数据收集管道。 Overal architecture is given below; 总体架构如下:

In summary system should get events from API gateway (1) ( one request for each event ) and the data should be written to Kinesis (2). 总而言之,系统应从API网关(1)获取事件(每个事件一个请求),并将数据写入Kinesis(2)。

I am expecting ~100k events per second. 我期望每秒约有10万个事件。 My question is related to KPL usage on Lambda functions. 我的问题与Lambda函数上的KPL使用有关。 On step 2 I am planning to write a Lambda method with KPL to write events on Kinesis with high throughput. 在第2步中,我计划用KPL编写Lambda方法,以高吞吐量在Kinesis上编写事件。 But I am not sure it is possible as API Gateway calls lambda function for each event separately. 但是我不确定是否有可能,因为API网关分别为每个事件调用lambda函数。

Is it possible/reasonable to use KPL in such architecture or I should using Kinesis Put API instead? 在这样的体系结构中使用KPL是否可能/合理,或者我应该改用Kinesis Put API?

        1                              2                              3                             4
+----------------+             +----------------+             +----------------+            +----------------+
|                |             |                |             |                |            |                |
|                |             |                |             |                |            |                |
|  AWS API GW    +-----------> |  AWS Lambda    +-----------> |  AWS Kinesis   +----------> |  AWS Lambda    |
|                |             |  Function with |             |  Streams       |            |                |
|                |             |  KPL           |             |                |            |                |
|                |             |                |             |                |            |                |
+----------------+             +----------------+             +----------------+            +-----+-----+----+
                                                                                                  |     |
                                                                                                  |     |
                                                                                                  |     |
                                                                                                  |     |
                                                                                                  |     |
                                                                                5                 |     |              6
                                                                         +----------------+       |     |      +----------------+
                                                                         |                |       |     |      |                |
                                                                         |                |       |     |      |                |
                                                                         |  AWS S3        <-------+     +----> |  AWS Redshift  |
                                                                         |                |                    |                |
                                                                         |                |                    |                |
                                                                         |                |                    |                |
                                                                         +----------------+                    +----------------+

I am also thinking about writing directly to S3 instead of calling lambda function from api-gw. 我也在考虑直接编写S3,而不是从api-gw调用lambda函数。 If first architecture is not reasonable this may be a solution but in that case I will have a delay till writing data to kinesis 如果第一个体系结构不合理,这可能是一个解决方案,但在那种情况下,我会稍等片刻,直到将数据写入运动

        1                                2                         3                              4                             5
+----------------+               +----------------+        +----------------+             +----------------+            +----------------+
|                |               |                |        |                |             |                |            |                |
|                |               |                |        |                |             |                |            |                |
|  AWS API GW    +----------->   |  AWS Lambda    +------> |  AWS Lambda    +-----------> |  AWS Kinesis   +----------> |  AWS Lambda    |
|                |               |  to write data |        |  Function with |             |  Streams       |            |                |
|                |               |  to S3         |        |  KPL           |             |                |            |                |
|                |               |                |        |                |             |                |            |                |
+----------------+               +----------------+        +----------------+             +----------------+            +-----+-----+----+
                                                                                                                              |     |
                                                                                                                              |     |
                                                                                                                              |     |
                                                                                                                              |     |
                                                                                                                              |     |
                                                                                                            6                 |     |              7
                                                                                                     +----------------+       |     |      +----------------+
                                                                                                     |                |       |     |      |                |
                                                                                                     |                |       |     |      |                |

I do not think using KPL is the right choice here. 我认为在这里使用KPL不是正确的选择。 The key concept of KPL is, that records get collected at the client and then send as a batch operation to Kinesis. KPL的关键概念是,在客户端收集记录,然后将其作为批处理操作发送给Kinesis。 Since Lambdas are stateless per invocation, it would be rather difficult to store the records for aggregation (before sending it to Kinesis). 由于Lambda在每次调用时都是无状态的,因此很难存储记录以进行聚合(在将记录发送给Kinesis之前)。

I think you should have a look at the following AWS article which explain how you can directly connect API-Gateway to Kinesis. 我认为您应该看一下以下AWS文章,其中介绍了如何将API-Gateway直接连接到Kinesis。 This way, you can avoid the extra Lambda which just forwards your request. 这样,您可以避免多余的Lambda只是转发您的请求。

Create an API Gateway API as an Kinesis Proxy 创建一个API网关API作为Kinesis代理

Obviously, if your data coming through AWS API Gateway corresponds to one Kinesis Data Streams record it makes no sense to use the KPL as pointed out by Jens. 显然,如果您通过AWS API Gateway传递的数据与一个Kinesis Data Streams记录相对应,则Jens指出,没有必要使用KPL。 In this case you can make direct call of Kinesis API without using Lambda. 在这种情况下,您可以直接调用Kinesis API,而无需使用Lambda。 Eventually, you may use some additional processing in Lambda and send the data through PutRecord (not PutRecords used by KPL). 最终,您可能会在Lambda中使用一些其他处理,并通过PutRecord(而不是KPL使用的PutRecords)发送数据。 Your code in JAVA will looks like this 您在JAVA中的代码将如下所示

AmazonKinesisClientBuilder clientBuilder = AmazonKinesisClientBuilder.standard();
clientBuilder.setRegion(REGION);
clientBuilder.setCredentials(new DefaultAWSCredentialsProviderChain());
clientBuilder.setClientConfiguration(new ClientConfiguration());
AmazonKinesis kinesisClient = clientBuilder.build();
...
//then later on each record
PutRecordRequest putRecordRequest = new PutRecordRequest();
putRecordRequest.setStreamName(STREAM_NAME);
putRecordRequest.setData(data);
putRecordRequest.setPartitionKey(daasEvent.getAnonymizedId());
putRecordRequest.setExplicitHashKey(Utils.randomExplicitHashKey());
putRecordRequest.setSequenceNumberForOrdering(sequenceNumberOfPreviousRecord);
PutRecordResult putRecordResult = kinesisClient.putRecord(putRecordRequest);
sequenceNumberOfPreviousRecord = putRecordResult.getSequenceNumber();

However , there may be cases when using KPL from lambda makes sense. 但是 ,在某些情况下,使用lambda的KPL是有意义的。 For example the data sent to AWS API Gateway contains multiple individual records which will be sent to one or multiple streams. 例如,发送到AWS API Gateway的数据包含多个单独的记录,这些记录将发送到一个或多个流。 In that cases the benefits (see https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html ) of KPL are still valid, but you have to be aware of specifics given by using of Lambda concretely an "issue" pointed out here https://github.com/awslabs/amazon-kinesis-producer/issues/143 and use 在那种情况下,KPL的好处(请参阅https://docs.aws.amazon.com/streams/latest/dev/kinesis-kpl-concepts.html )仍然有效,但是您必须了解使用的Lambda具体在这里指出了一个“问题” https://github.com/awslabs/amazon-kinesis-producer/issues/143和使用

kinesisProducer.flushSync() 

at the end of insertions which worked also for me. 在插入结束时对我也有用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM