简体   繁体   English

使用AWS Java DynamoDB流处理DynamoDB流Kinesis适配器

[英]Processing DynamoDB streams using the AWS Java DynamoDB streams Kinesis adapter

I'm attempting to capture DynamoDB table changes using DynamoDB streams and the AWS provided Java DynamoDB streams Kinesis adapter. 我正在尝试使用DynamoDB流和AWS提供的Java DynamoDB流Kinesis适配器捕获DynamoDB表更改。 I'm working with the AWS Java SDKs in a Scala app. 我正在使用Scala应用程序中的AWS Java SDK。

I started by following the AWS guide and by going through the AWS published code example . 我首先遵循AWS指南并浏览AWS发布的代码示例 However I'm having issues getting Amazon's own published code working in my environment. 但是,我遇到了在我的环境中使用亚马逊自己发布的代码的问题。 My issue lies with the KinesisClientLibConfiguration object. 我的问题在于KinesisClientLibConfiguration对象。

In the example code, KinesisClientLibConfiguration is configured with the stream ARN provided by DynamoDB. 在示例代码中, KinesisClientLibConfiguration配置了KinesisClientLibConfiguration提供的流ARN。

new KinesisClientLibConfiguration("streams-adapter-demo",
    streamArn, 
    streamsCredentials, 
    "streams-demo-worker")

I followed a similar pattern in my Scala app by first locating the current ARN from my Dynamo table: 我在我的Scala应用程序中遵循类似的模式,首先从我的Dynamo表中找到当前的ARN:

lazy val streamArn = dynamoClient.describeTable(config.tableName)
.getTable.getLatestStreamArn

And then creating the KinesisClientLibConfiguration with the provided ARN: 然后使用提供的ARN创建KinesisClientLibConfiguration

lazy val kinesisConfig :KinesisClientLibConfiguration =
new KinesisClientLibConfiguration(
  "testProcess",
  streamArn,
  defaultProviderChain,
  "testWorker"
).withMaxRecords(1000)
   .withRegionName("eu-west-1")
   .withMetricsLevel(MetricsLevel.NONE)
  .withIdleTimeBetweenReadsInMillis(500)
  .withInitialPositionInStream(InitialPositionInStream.TRIM_HORIZON)

I've verified the provided stream ARN and everything matches what I see in the AWS console. 我已经验证了提供的流ARN,所有内容都与我在AWS控制台中看到的相匹配。

At runtime I end up getting an exception stating that the provided ARN is not a valid stream name: 在运行时,我最终得到一个异常,说明提供的ARN不是有效的流名称:

com.amazonaws.services.kinesis.clientlibrary.lib.worker.ShardSyncTask call
SEVERE: Caught exception while sync'ing Kinesis shards and leases
com.amazonaws.services.kinesis.model.AmazonKinesisException: 1 validation     
error detected: Value 'arn:aws:dynamodb:eu-west-1:STREAM ARN' at 
'streamName'    failed to satisfy constraint: Member must satisfy regular 
expression pattern: [a-zA-Z0-9_.-]+ (Service: AmazonKinesis; Status Code: 
400; Error Code: ValidationException; Request ID: )

Looking at the documentation provided on KinesisClientLibConfiguration this does make sense as the second parameter is listed as the streamName without any mention of an ARN. 查看KinesisClientLibConfiguration提供的文档,这确实有意义,因为第二个参数被列为streamName而没有提及ARN。

I can't seem to find anything on KinesisClientLibConfiguration that is related to an ARN. 我似乎无法在与ARN相关的KinesisClientLibConfiguration上找到任何内容。 Since I'm working with a DynamoDB stream and not a Kinesis stream I'm also unsure how to find my stream name. 由于我正在使用DynamoDB流而不是Kinesis流,因此我也不确定如何查找我的流名称。

At this point I'm unsure what I'm missing from the published AWS example, it seems like they may be using a much older version of the KCL. 在这一点上,我不确定我在发布的AWS示例中遗漏了什么,似乎他们可能正在使用更旧版本的KCL。 I'm using version 1.7.0 of amazon-kinesis-client. 我正在使用版本1.7.0的amazon-kinesis-client。

The issue actually ended up being outside of my KinesisClientLibConfiguration . 该问题实际上最终超出了我的KinesisClientLibConfiguration

I was able to get around this issue by using the same configuration and by providing both the stream adapter included with the DynamoDB stream adapter library and clients for both DynamoDB and CloudWatch. 我能够通过使用相同的配置并提供DynamoDB流适配器库中包含的流适配器和DynamoDB和CloudWatch的客户端来解决此问题。

My working solution now looks like this. 我的工作解决方案现在看起来像这样

Defining the Kinesis client config. 定义Kinesis客户端配置。

//Kinesis config for DynamoDB streams
lazy val kinesisConfig :KinesisClientLibConfiguration =
    new KinesisClientLibConfiguration(
        getClass.getName, //DynamoDB shard lease table name
        streamArn, //pulled from the dynamo table at runtime
        dynamoCredentials, //DefaultAWSCredentialsProviderChain 
        KeywordTrackingActor.NAME //Lease owner name
    ).withMaxRecords(1000) //using AWS recommended value
     .withIdleTimeBetweenReadsInMillis(500) //using AWS recommended value
    .withInitialPositionInStream(InitialPositionInStream.TRIM_HORIZON)

Define a stream adapter and a CloudWatch client 定义流适配器和CloudWatch客户端

val streamAdapterClient :AmazonDynamoDBStreamsAdapterClient = new AmazonDynamoDBStreamsAdapterClient(dynamoCredentials)
streamAdapterClient.setRegion(region)

val cloudWatchClient :AmazonCloudWatchClient = new AmazonCloudWatchClient(dynamoCredentials)
cloudWatchClient.setRegion(region)

Create an instance of a RecordProcessorFactory , it's up to you to define a class that implements the KCL provided IRecordProcessorFactory and the returned IRecordProcessor . 创建一个RecordProcessorFactory的实例,由你来定义一个实现KCL提供的IRecordProcessorFactory和返回的IRecordProcessor

val recordProcessorFactory :RecordProcessorFactory = new RecordProcessorFactory(context, keywordActor, config.keywordColumnName)

And the part I was missing, all of this needs to be provided to your worker. 而我失踪的部分,所有这些都需要提供给你的工人。

val worker :Worker =
  new Worker.Builder()
    .recordProcessorFactory(recordProcessorFactory)
    .config(kinesisConfig)
    .kinesisClient(streamAdapterClient)
    .dynamoDBClient(dynamoClient)
    .cloudWatchClient(cloudWatchClient)
    .build()

//this will start record processing
streamExecutorService.submit(worker)

Alternatively, you can use the com.amazonaws.services.dynamodbv2.streamsadapter.StreamsWorker instead of com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker which internally uses the AmazonDynamoDBStreamsAdapterClient . 或者,您可以使用com.amazonaws.services.dynamodbv2.streamsadapter.StreamsWorker而不是com.amazonaws.services.kinesis.clientlibrary.lib.worker.Worker ,它在内部使用AmazonDynamoDBStreamsAdapterClient

ie

lazy val kinesisConfig :KinesisClientLibConfiguration =
new KinesisClientLibConfiguration(
    getClass.getName, //DynamoDB shard lease table name
    streamArn, //pulled from the dynamo table at runtime
    dynamoCredentials, //DefaultAWSCredentialsProviderChain 
    KeywordTrackingActor.NAME //Lease owner name
).withMaxRecords(1000) //using AWS recommended value
 .withIdleTimeBetweenReadsInMillis(500) //using AWS recommended value
.withInitialPositionInStream(InitialPositionInStream.TRIM_HORIZON)

val worker = new com.amazonaws.services.dynamodbv2.streamsadapter.StreamsWorker(recordProcessorFactory, kinesisConfig)

只是为了回答问题所在 - 当你想要流名称时,你提供了ARN。

I did a PR recently to this project gfc-aws-kinesis and you can now use it by just passing the adapter and writing a KinesisRecordAdapter implementation. 我最近对这个项目gfc-aws-kinesis做了一个PR,你现在可以通过传递适配器并编写KinesisRecordAdapter实现来使用它。

In the example I'm using Scanamo to parse the hashmap 在示例中,我使用Scanamo来解析hashmap

Create the client 创建客户端

val streamAdapterClient: AmazonDynamoDBStreamsAdapterClient =
    new AmazonDynamoDBStreamsAdapterClient()

Pass it in the configuration: 在配置中传递它:

val streamConfig = KinesisStreamConsumerConfig[Option[A]](
  applicationName,
  config.stream, //the full dynamodb stream arn
  regionName = Some(config.region),
  checkPointInterval = config.checkpointInterval,
  initialPositionInStream = config.streamPosition,
  dynamoDBKinesisAdapterClient = Some(streamAdapterClient)
)
KinesisStreamSource(streamConfig).mapMaterializedValue(_ => NotUsed)

Create an implicit record reader suitable for reading dynamodb events: 创建一个适合读取dynamodb事件的隐式记录阅读器:

implicit val kinesisRecordReader
  : KinesisRecordReader[Option[A]] =
  new KinesisRecordReader[Option[A]] {
    override def apply(record: Record): Option[A] = {
      record match {
        case recordAdapter: RecordAdapter =>
          val dynamoRecord: DynamoRecord =
            recordAdapter.getInternalObject
          dynamoRecord.getEventName match {
            case "INSERT" =>
              ScanamoFree
                .read[A](
                  dynamoRecord.getDynamodb.getNewImage)
                .toOption
            case _ => None
          }
        case _ => None
      }
    }
  }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM