简体   繁体   中英

Aggregating messages from Kafka in a storm bolt based on a key

Background

Backend log processing system already in place with Kafka and Storm clusters.

Use Case

Multiple events of certain type X are generated and logged on backend side. Each of which contain an id say userid . Now these events are consumed by one storm bolt and extract useid and some other field say userdata and writes to another topic in kafka, say data topic.

Now some other topology consumes from this data topic. It finds multiple such events with a single userid and different userdata . If there are n such records present them some action needs to be taken.

Problem

How to aggregate in a storm bolt using some key, data from kafka ? Some users may reach N record count in 20 mins and some may take few hours depending on users interaction hence events logged on the backend side. The goal is to get all userids and corresponding usedata when count for such records reaches some N

This is not a Storm-specific problem, but connected to user session management. If you expect your system to face a lot of sessions which take a long time to have a certain status (reach n events in your case) and eventually build up a lot of data in the meantime, then you need to account for that in your design, meaning to choose n wisely and build a lot of integration tests around it which check that your system stays responsive under load.

You could

  • consider to make n dynamic based on load and statistics (I guess that's what DevOps is all about)
  • only store n for userid and persist the data in a database or on the filesystem and get it back if n reaches a critical value. This somehow contradicts the idea of a streaming topology, but it starts to make sense, espcially if you need to persist (parts of) the data after processing it.

Not that if you want to deploy an update of your software you also need to consider a time value t besides n since the shorter the session the easier it is deploy continuously.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM