繁体   English   中英

使用 SSL 访问 Kafka 的 spark 结构化流引发错误

[英]spark structured streaming accessing the Kafka with SSL raised error

我计划从 Kafka(自签名证书)中提取数据。

我的消费者如下

from pyspark.sql import SparkSession
from pyspark.sql.functions import from_json
from pyspark.sql.types import StructType, StringType, LongType, TimestampType,IntegerType

if __name__ == '__main__':
    spark = SparkSession \
        .builder \
        .appName("pyspark_structured_streaming_kafka") \
        .getOrCreate()

    df_raw = spark.readStream \
        .format("kafka") \
        .option("kafka.ssl.truststore.location","file:///Users/picomy/Kafka-keystore/server.truststore") \
        .option("kafka.ssl.truststore.password","aoeuid") \
        .option("kafka.ssl.keystore.location","file:///Users/picomy/Kafka-keystore/kclient.keystore") \
        .option("kafka.ssl.keystore.password","aoeuid") \
        .option("kafka.isolation.level","read_committed") \
        .option("kafka.bootstrap.servers","52.81.249.81:9093") \
        .option("subscribe","product") \
        .option("startingOffsets","latest") \
        .option("kafka.ssl.endpoint.identification.algorithm","") \
        .option("kafka.isolation.level","read_committed") \
        .load()

    product_schema = StructType() \
        .add("product_name", StringType()) \
        .add("product_factory", StringType()) \
        .add("yield_num", IntegerType()) \
        .add("yield_time", StringType())    

    df_1=df_raw.selectExpr("CAST(value AS STRING)") \
               .select(from_json("value",product_schema).alias("data")) \
               .select("data.*") \
               .writeStream \
               .format("console") \
               .outputMode("append") \
               .option("checkpointLocation","file:///Users/picomy/Kafka-Output/checkpoint") \
               .start() \
               .awaitTermination()  

当我提交作业时,我收到错误

21/02/04 17:33:58 WARN NetworkClient: [Consumer clientId=consumer-spark-kafka-source-bdbf46eb-42ce-4fd1-bef7-08222138b49c-32919539-executor-1, groupId=spark-kafka-source-bdbf46eb-42ce-4fd1-bef7-08222138b49c-32919539-executor] Bootstrap broker 52.81.249.81:9093 (id: -1 rack: null) disconnected

我的代码可以很好地与 9092(PLAIN 协议)配合使用。 在此感谢麦克的专业知识。

Kafka 消费者可以很好地使用相同的证书

./kafka-console-consumer.sh --bootstrap-server 52.81.249.81:9093 --topic product --consumer.config ../config/consumer.properties
{"product_name": "X Laptop","product_factory": "B-3231","yield_num": 899,"yield_time": "20210201 22:00:01"}
{"product_name": "X Laptop","product_factory": "B-3231","yield_num": 899,"yield_time": "20210201 22:00:01"}
{"product_name": "X Laptop","product_factory": "B-3231","yield_num": 899,"yield_time": "20210201 22:00:01"}

我不知道这个问题的根本原因在哪里。

我 append 另一个选项通过 SSL 告诉 Kafka 代理通信。

.option("kafka.security.protocol","SSL")

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM