[英]Stream Eventhub Fixed Length data to a streaming DataFrame
摘要 - 我有一个作为 EventHub 的流媒体源,其中数据以固定长度格式接收。 现在我想将包含 fixedLength 的流式源读入 spark 数据帧
注意:我可以读取 fixedLength 是否来自目录并创建 substring 并根据我的需要进行分类。 但是对于像 eventHub 这样的流式源,我该怎么做(因为 eventhub 将所有数据都作为 Body )
Lets take my fixed lenght file is sample.txt that contains - 00101292017you1235
我的代码如下
import org.apache.spark.eventhubs
import org.apache.spark.eventhubs.EventPosition._
import com.microsoft.azure.eventhubs.EventHubClientOptions
import org.apache.spark.eventhubs._
import com.microsoft.azure.eventhubs.impl.StringUtil
import org.apache.spark.sql.types._
import org.apache.spark.sql.functions._
val endpoint = "Endpoint=sb://XXXXXX.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=XXXXXXX"
val eventHub = "XXXX"
val connectionString = ConnectionStringBuilder(endpoint) .setEventHubName("XXXXX") .build
val ehConf = EventHubsConf(connectionString).setStartingPosition(EventPosition.fromEndOfStream) .setMaxEventsPerTrigger(500)
val ehStream = spark.readStream.format("eventhubs").options(ehConf.toMap).load
val messages =ehStream.withColumn("Offset", $"offset".cast(LongType)).withColumn("Time (readable)", $"enqueuedTime".cast(TimestampType)).withColumn("Timestamp", $"enqueuedTime".cast(LongType)).withColumn("Body", $"body".cast(StringType)).select("Offset", "Time (readable)", "Timestamp", "Body")
messages.writeStream.outputMode("append").option("truncate", false) .format("console").start() .awaitTermination()
对于上述场景,在 eventthub 00101292017you1235 中收到的正文数据如何
就像是
df.select(
df.value.substr(1,3).alias('id'),
df.value.substr(4,8).alias('date'),
df.value.substr(12,3).alias('string'),
df.value.substr(15,4).cast('integer').alias('integer')
).show
will result in:
+---+--------+------+-------+
| id| date|string|integer|
+---+--------+------+-------+
|001|01292017| you| 1234|
|002|01302017| me| 5678|
+---+--------+------+-------+
这有帮助
val messages =ehStream.withColumn("FirstColumn",$"body".substr(1,3).cast(StringType)).select("FirstColumn")
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.