简体   繁体   中英

Extract particular data from Kafka topic

I'm doing real time streaming on Twitter and wonder is there a way to extract only messages and certain values from Kafka topic?

You can use ksqlDB to do this. For example:

ksql> CREATE STREAM TWEETS WITH (KAFKA_TOPIC='twitter_01', VALUE_FORMAT='Avro');

ksql> SELECT USER->SCREENNAME, TEXT FROM TWEETS WHERE TEXT LIKE '%cool%' EMIT CHANGES;

+-------------------+------------------------------------------------------------------------------------------+
|USER__SCREENNAME   |TEXT                                                                                      |
+-------------------+------------------------------------------------------------------------------------------+
|MobileGist         |This is super cool!! Great work @houchens_kim!                                            |

You can also build a new topic with the results of this if you want

ksql> CREATE STREAM COOL_TWEETS AS SELECT USER->SCREENNAME, TEXT FROM TWEETS WHERE TEXT LIKE '%cool%' EMIT CHANGES;

Since you tagged Python it's worth pointing out that you can call ksqlDB using its REST API from Python. Here's an example .

Ref: Exploring ksqlDB with Twitter Data

You didn't mention what type of data you are receiving. Tweets, yes, but as CSV? JSON? Avro? Protobuf?

The short answer is "yes". Just as you can open a text file and read data out of it, you can get data out of a Kafka record. They just happen to be streaming in constantly

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM