简体   繁体   English

Spark结构化流处理多行输入

[英]Spark Structured Streaming process multi line input

I have to process logs where log message from Kafka is made up of multiple lines, separated by a line break. 我必须处理日志,其中来自Kafka的日志消息由多行组成,由换行符分隔。 How to process it in Spark Structured streaming? 如何在Spark结构化流中处理它?

Use text data source ( TextFileFormat ) and wholetext option to load files as a single row (ie not splitting by "\\n"). 使用text数据源( TextFileFormat )和wholetext选项将文件作为单行加载(即不按“ \\ n”拆分)。

spark
  .readStream
  .option("wholetext", true)
  .text("files/")
  .writeStream
  .format("console")
  .start

With kafka data source it's even easier because the entire file is simply in value column. 使用kafka数据源,它甚至更容易,因为整个文件都在value列中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 多行处理 Spark 结构化流 - Multi line processing Spark structured streaming 如何在Spark中处理多行输入记录 - How to process multi line input records in Spark 在 Spark 结构化流处理中跳过批次 - Skipping of batches in spark structured streaming process Spark Structured Streaming NOT 处理 Kafka 偏移量过期 - Spark Structured Streaming NOT process Kafka offset expires Spark Structured Streaming 指标:为什么进程速率可以大于输入速率? - Spark Structured Streaming metrics: Why process rate can be greater than input rate? 使用 Spark Structured Streaming 从 Kafka 主题读取:可以由 Spark 解析发布到 Kafka 主题的多行 JSON 吗? - Reading from Kafka topic using Spark Structured Streaming: Can multi-line JSON published to Kafka topic be parsed by Spark? 在 Spark Structured Streaming 中逐行拆分 Kafka 消息 - Splitting Kafka Message Line by line in Spark Structured Streaming 如何在Spark Structured Streaming中处理已删除(或更新)的行? - How can I process deleted (or updated) rows in Spark Structured Streaming? 如何使用Spark结构化流逐块处理文件? - How to process files using Spark Structured Streaming chunk by chunk? Spark 2.3.0退出结构化流过程的正确方法 - Spark 2.3.0 proper way to exit structured streaming process
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM