Spark结构化流处理多行输入

Question

I have to process logs where log message from Kafka is made up of multiple lines, separated by a line break. 我必须处理日志，其中来自Kafka的日志消息由多行组成，由换行符分隔。 How to process it in Spark Structured streaming? 如何在Spark结构化流中处理它？

Answer 1

Use text data source ( TextFileFormat ) and wholetext option to load files as a single row (ie not splitting by "\\n"). 使用text数据源（ TextFileFormat ）和wholetext选项将文件作为单行加载（即不按“ \\ n”拆分）。

spark
  .readStream
  .option("wholetext", true)
  .text("files/")
  .writeStream
  .format("console")
  .start

With kafka data source it's even easier because the entire file is simply in value column. 使用kafka数据源，它甚至更容易，因为整个文件都在value列中。

Spark结构化流处理多行输入

问题描述

1 个解决方案

解决方案1
0 2019-11-27 13:05:55

Spark结构化流处理多行输入

问题描述

1 个解决方案

解决方案1 0 2019-11-27 13:05:55

解决方案1
0 2019-11-27 13:05:55