将AVRO Kafka流存储到文件系统时出现问题

Question

I want to store my AVRO kafka streams to file system using my spark streaming API with the following scala code in delimited format, but facing some challenges in achieving this 我想使用我的Spark Streaming API和以下定标格式的Scala代码将我的AVRO kafka流存储到文件系统中，但是在实现此目标时面临一些挑战

record.write.mode(SaveMode.Append).csv("/Users/Documents/kafka-poc/consumer-out/)

Since, record(generic record) is not a DF or RDD, I am not sure how to proceed with this? 由于record（generic record）不是DF或RDD，因此我不确定如何进行此操作？

Code 码

       val messages = SparkUtilsScala.createCustomDirectKafkaStreamAvro(ssc, kafkaParams, zookeeper_host, kafkaOffsetZookeeperNode, topicsSet)
       val requestLines = messages.map(_._2) 
       requestLines.foreachRDD((rdd, time: Time) => {
       rdd.foreachPartition { partitionOfRecords => {
       val recordInjection = SparkUtilsJava.getRecordInjection(topicsSet.last)
       for (avroLine <- partitionOfRecords) {
       val record = recordInjection.invert(avroLine).get
       println("Consumer output...."+record)                                                                
       println("Consumer output schema...."+record.getSchema)
       }}}}

following is the output & schema 以下是输出和架构

{"username": "Str 1-0", "tweet": "Str 2-0", "timestamp": 0}
{"type":"record","name":"twitter_schema","fields":[{"name":"username","type":"string"},{"name":"tweet","type":"string"},{"name":"timestamp","type":"int"}]}

Thanks in advance and appreciate your help 在此先感谢您，并感谢您的帮助

Answer 1

I found a solution for this. 我找到了解决方案。

val jsonStrings: RDD[String] = sc.parallelize(Seq(record.toString())); 
val result = sqlContext.read.json(jsonStrings).toDF(); 
result.write.mode("Append").csv("/Users/Documents/kafka-poc/‌consumer-out/");

将AVRO Kafka流存储到文件系统时出现问题

问题描述

1 个解决方案

解决方案1
0 2017-07-06 19:42:42

将AVRO Kafka流存储到文件系统时出现问题

问题描述

1 个解决方案

解决方案1 0 2017-07-06 19:42:42

解决方案1
0 2017-07-06 19:42:42