简体   繁体   中英

Does Kafka python API support stream processing?

I have used Kafka Streams in Java. I could not find similar API in python. Do Apache Kafka support stream processing in python?

Kafka Streams is only available as a JVM library, but there are at least two Python implementations of it

In theory, you could try playing with Jython or Py4j to support it the JVM implementation, but otherwise you're stuck with consumer/producer or invoking the KSQL REST interface with the built-in SQL functions if you don't want to write your own UDFs (again, Java only, last I checked).

Outside of those options, you can also try Apache Beam, Flink or Spark, but they each require an external cluster scheduler to scale out.

If you are using Apache Spark , you can use it as producer and as consumer . No need to rely on 3rd part libraries like Faust, but you will need a Spark cluster manager (Standalone, YARN, or Kubernetes to scale it out)

To consume Kafka data streams in Spark, use the Structured Streaming + Kafka Integration Guide .

Keep in mind that you will have to append spark-sql-kafka package when using spark-submit :

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1 StructuredStreaming.py

This solution has been tested with Spark 3.0.1 and Kafka 2.7.0 with PySpark .

This resource can also be useful.

Previously KStrame python API was not available but now its available with new KStream python library https://pypi.org/project/kstreams/

Features:

  1. Produce events
  2. Consumer events with Streams
  3. Prometheus metrics and custom monitoring
  4. TestClient
  5. Custom Serialization and Deserialization
  6. Easy to integrate with any async framework. No tied to any library!!
  7. Yield events from streams
  8. Store (kafka streams pattern)
  9. Stream Join
  10. Windowing

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM