简体   繁体   English

将AVRO Kafka流存储到文件系统时出现问题

[英]Issue storing AVRO Kafka streams to File System

I want to store my AVRO kafka streams to file system using my spark streaming API with the following scala code in delimited format, but facing some challenges in achieving this 我想使用我的Spark Streaming API和以下定标格式的Scala代码将我的AVRO kafka流存储到文件系统中,但是在实现此目标时面临一些挑战

record.write.mode(SaveMode.Append).csv("/Users/Documents/kafka-poc/consumer-out/)

Since, record(generic record) is not a DF or RDD, I am not sure how to proceed with this? 由于record(generic record)不是DF或RDD,因此我不确定如何进行此操作?

Code

       val messages = SparkUtilsScala.createCustomDirectKafkaStreamAvro(ssc, kafkaParams, zookeeper_host, kafkaOffsetZookeeperNode, topicsSet)
       val requestLines = messages.map(_._2) 
       requestLines.foreachRDD((rdd, time: Time) => {
       rdd.foreachPartition { partitionOfRecords => {
       val recordInjection = SparkUtilsJava.getRecordInjection(topicsSet.last)
       for (avroLine <- partitionOfRecords) {
       val record = recordInjection.invert(avroLine).get
       println("Consumer output...."+record)                                                                
       println("Consumer output schema...."+record.getSchema)
       }}}}

following is the output & schema 以下是输出和架构

{"username": "Str 1-0", "tweet": "Str 2-0", "timestamp": 0}
{"type":"record","name":"twitter_schema","fields":[{"name":"username","type":"string"},{"name":"tweet","type":"string"},{"name":"timestamp","type":"int"}]}

Thanks in advance and appreciate your help 在此先感谢您,并感谢您的帮助

I found a solution for this. 我找到了解决方案。

val jsonStrings: RDD[String] = sc.parallelize(Seq(record.toString())); 
val result = sqlContext.read.json(jsonStrings).toDF(); 
result.write.mode("Append").csv("/Users/Documents/kafka-poc/‌​consumer-out/"); 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM