[英]How do I extract values from a kafka row via spark under structured streaming?
[英]How do I connect to a Kerberos-secured Kafka cluster with Spark Structured Streaming?
我正在嘗試使用結構化流API連接到受Kerberos保護的Kafka群集。 下面是我的代碼和Spark的輸出。 我看不到任何異常,只是警告客戶端斷開連接的消息。 解決此問題的下一步是什么?
import org.apache.spark.sql.SparkSession
import org.apache.log4j.{Logger, Level}
object Main {
def main(args: Array[String]) {
Logger.getLogger("org").setLevel(Level.WARN)
Logger.getLogger("akka").setLevel(Level.WARN)
val spark = SparkSession.builder()
.master("local[*]")
.appName("myapp")
.config("spark.executor.extraJavaOptions", "java.security.auth.login.config=jaas.conf")
.getOrCreate()
import spark.implicits._
val lines = spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "broker1:9100,broker2:9100")
.option("security.protocol", "SASL_PLAINTEXT")
.option("sasl.kerberos.service.name", "mysvcname")
.option("subscribe", "mytopic")
.load()
val query = lines.select("value").writeStream.format("console").start()
query.awaitTermination()
}
這是輸出:
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
19/02/11 17:15:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
19/02/11 17:15:10 WARN NetworkClient: [Consumer clientId=consumer-1, groupId=spark-kafka-source-cef02569-ab16-4ca2-a9e8-18bcea992c0d--1359730493-driver-0] Bootstrap broker broker2:9100 (id: -2 rack: null) disconnected
19/02/11 17:15:11 WARN NetworkClient: [Consumer clientId=consumer-1, groupId=spark-kafka-source-cef02569-ab16-4ca2-a9e8-18bcea992c0d--1359730493-driver-0] Bootstrap broker broker1:9100 (id: -1 rack: null) disconnected
19/02/11 17:15:11 WARN NetworkClient: [Consumer clientId=consumer-1, groupId=spark-kafka-source-cef02569-ab16-4ca2-a9e8-18bcea992c0d--1359730493-driver-0] Bootstrap broker broker2:9100 (id: -2 rack: null) disconnected
19/02/11 17:15:11 WARN NetworkClient: [Consumer clientId=consumer-1, groupId=spark-kafka-source-cef02569-ab16-4ca2-a9e8-18bcea992c0d--1359730493-driver-0] Bootstrap broker broker1:9100 (id: -1 rack: null) disconnected
19/02/11 17:15:11 WARN NetworkClient: [Consumer clientId=consumer-1, groupId=spark-kafka-source-cef02569-ab16-4ca2-a9e8-18bcea992c0d--1359730493-driver-0] Bootstrap broker broker1:9100 (id: -1 rack: null) disconnected
19/02/11 17:15:11 WARN NetworkClient: [Consumer clientId=consumer-1, groupId=spark-kafka-source-cef02569-ab16-4ca2-a9e8-18bcea992c0d--1359730493-driver-0] Bootstrap broker broker2:9100 (id: -2 rack: null) disconnected
...
我發現了我的問題。 指定安全協議選項時,選項名稱必須以“ kafka”為前綴。 這令人困惑,因為對於普通的Kafka使用者而言,該選項僅是security.protocol,但出於配置Spark的目的, bootstrap.servers和security.protocol (以及您可能需要的任何其他選項/屬性)都必須以kafka作為前綴。 。 我的原始代碼是:
.option("security.protocol", "SASL_PLAINTEXT")
正確的選項是:
.option("kafka.security.protocol", "SASL_PLAINTEXT")
這是有效的完整代碼:
import org.apache.spark.sql.SparkSession
import org.apache.log4j.{Level, Logger}
object Main {
def main(args: Array[String]) {
Logger.getLogger("org").setLevel(Level.INFO)
Logger.getLogger("akka").setLevel(Level.INFO)
val spark = SparkSession.builder()
.master("local[*]")
.appName("myapp")
.config("spark.executor.extraJavaOptions", "java.security.auth.login.config=c:/krb/jaas.conf")
.getOrCreate()
import spark.implicits._
val lines = spark.readStream.format("kafka")
.option("kafka.bootstrap.servers", "broker1:9100,broker2:9100")
.option("kafka.security.protocol", "SASL_PLAINTEXT")
.option("subscribe", "mytopic")
.load()
val query = lines.select("value").writeStream.format("console").start()
query.awaitTermination()
}
}
供參考,這是jaas.conf文件的內容:
KafkaClient {
com.sun.security.auth.module.Krb5LoginModule required
useKeyTab=true
keyTab="c:/krb/mykeytab.keytab"
principal="myaccount@mydomain.int"
storeKey=true
useTicketCache=false
serviceName="myservicename";
};
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.