[英]Spark Structured Streaming process multi line input
I have to process logs where log message from Kafka is made up of multiple lines, separated by a line break. 我必须处理日志,其中来自Kafka的日志消息由多行组成,由换行符分隔。 How to process it in Spark Structured streaming? 如何在Spark结构化流中处理它?
Use text
data source ( TextFileFormat
) and wholetext
option to load files as a single row (ie not splitting by "\\n"). 使用text
数据源( TextFileFormat
)和wholetext
选项将文件作为单行加载(即不按“ \\ n”拆分)。
spark
.readStream
.option("wholetext", true)
.text("files/")
.writeStream
.format("console")
.start
With kafka
data source it's even easier because the entire file is simply in value
column. 使用kafka
数据源,它甚至更容易,因为整个文件都在value
列中。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.