简体   繁体   中英

Analyze messages from Kafka consumer

I set up a Kafka consumer-producer system, and I need to process the transmitted messages. These are lines from a JSON file like

ConsumerRecord(topic=u'json_data103052', partition=0, offset=676, timestamp=1542710197257, timestamp_type=0, key=None, value='{"Name": "Simone", "Surname": "Zimbolli", "gender": "Other", "email": "zzz@uiuc.edu", "country": "Nigeria", "date": "11/07/2018"}', checksum=354265828, serialized_key_size=-1, serialized_value_size=189)

I am looking for an easy to implement solution to

  • Define a streaming window
  • Analyze the messages in the window (count number of unique users and similar things)

Does anybody have suggestions on how to proceed? Thanks.

I am having issues using Spark, so I would prefer avoiding it. I am scripting in Python using Jupyter.

Here is my code:

from kafka import KafkaConsumer
from random import randint
from time import sleep

bootstrap_servers = ['localhost:9092']

%store -r topicName    # Get the topic name from the kafka producer
print topicName

consumer = KafkaConsumer(bootstrap_servers = bootstrap_servers,

for message in consumer:
    print (message)

Using Kafka Streams API is what you need I guess. You have all the features you need for windowing. You can find more info about Kafka Streams here:


For your scenario, Kafka Streams seems suitable. It has support of windowing with following 4 types :

Tumbling time window - Time-based   Fixed-size, non-overlapping, gap-less windows
Hopping time window- Time-based Fixed-size, overlapping windows
Sliding time window- Time-based Fixed-size, overlapping windows that work on differences between record timestamps
Session window

For python, there is library : https://github.com/wintoncode/winton-kafka-streams

That can be useful for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM