[英]How to get and print one row from Kafka with pyspark? Queries with streaming sources must be executed with writeStream.start()
我正在嘗試從Kafka
讀取一些數據以查看其中的內容。
我寫
builder = SparkSession.builder\
.appName("PythonTest01")
spark = builder.getOrCreate()
# Subscribe to 1 topic
df = spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers", config["kafka"]["bootstrap.servers"]) \
.option("subscribe", dataFlowTopic) \
.load()
# df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
df.printSchema()
df = df.first()
query = df \
.writeStream \
.outputMode('complete') \
.format('console') \
.start()
query.awaitTermination()
不幸的是,它發誓
pyspark.sql.utils.AnalysisException: Queries with streaming sources must be executed with writeStream.start();
它想要什么以及如何滿足它?
如果我刪除first()
它發誓
Complete output mode not supported when there are no streaming aggregations on streaming DataFrames/Datasets;
我要寫
#df = df.first()
query = df \
.writeStream \
.outputMode('append') \
.format('console') \
.start()
query.awaitTermination()
這不是打印第一行,而是最后一行,而不是終止。
並且不終止。
這是一種蒸汽; 這並不意味着終止
打印不是第一行,而是最后一行
請參閱startingOffsets
選項。 默認是latest
的
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.