简体   繁体   English

Kafka python API 是否支持stream处理?

[英]Does Kafka python API support stream processing?

I have used Kafka Streams in Java. I could not find similar API in python. Do Apache Kafka support stream processing in python?我在Java中使用过Kafka Streams,在python中找不到类似的API,请问Apache Kafka支持python中的stream处理吗?

Kafka Streams is only available as a JVM library, but there are at least two Python implementations of it Kafka Streams 仅作为 JVM 库提供,但至少有两个 Python 实现

In theory, you could try playing with Jython or Py4j to support it the JVM implementation, but otherwise you're stuck with consumer/producer or invoking the KSQL REST interface with the built-in SQL functions if you don't want to write your own UDFs (again, Java only, last I checked).从理论上讲,您可以尝试使用 Jython 或 Py4j 来支持它的 JVM 实现,但否则,如果您不想编写您的自己的 UDF(同样,仅限 Java,我上次检查过)。

Outside of those options, you can also try Apache Beam, Flink or Spark, but they each require an external cluster scheduler to scale out.除了这些选项之外,您还可以尝试 Apache Beam、Flink 或 Spark,但它们都需要一个外部集群调度程序来横向扩展。

If you are using Apache Spark , you can use it as producer and as consumer .如果您使用Apache Spark ,您可以将其用作生产者消费者 No need to rely on 3rd part libraries like Faust, but you will need a Spark cluster manager (Standalone, YARN, or Kubernetes to scale it out)不需要像 Faust 那样依赖第三部分库,但你需要一个 Spark 集群管理器(独立、YARN 或 Kubernetes 来扩展它)

To consume Kafka data streams in Spark, use the Structured Streaming + Kafka Integration Guide .要在 Spark 中使用 Kafka 数据流,请使用Structured Streaming + Kafka 集成指南

Keep in mind that you will have to append spark-sql-kafka package when using spark-submit :请记住,在使用spark-submit时,您必须附加spark-sql-kafka包:

spark-submit --packages org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1 StructuredStreaming.py

This solution has been tested with Spark 3.0.1 and Kafka 2.7.0 with PySpark .该解决方案已经在Spark 3.0.1Kafka 2.7.0PySpark上进行了测试。

This resource can also be useful. 资源也很有用。

Previously KStrame python API was not available but now its available with new KStream python library https://pypi.org/project/kstreams/以前 KStrame python API 不可用,但现在可以使用新的 KStream python 库https://pypi.org/project/kstreams/

Features:特征:

  1. Produce events制作活动
  2. Consumer events with Streams使用 Streams 的消费者事件
  3. Prometheus metrics and custom monitoring Prometheus 指标和自定义监控
  4. TestClient测试客户端
  5. Custom Serialization and Deserialization自定义序列化和反序列化
  6. Easy to integrate with any async framework.易于与任何异步框架集成。 No tied to any library!!没有绑定到任何图书馆!!
  7. Yield events from streams来自流的产量事件
  8. Store (kafka streams pattern)商店(kafka 流模式)
  9. Stream Join Stream 加入
  10. Windowing开窗

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM