[英]Kafka topic object into spark data frame conversion and writing into HDFS
I am trying to create kafka consumer in the spark coding ,while creating i am getting exception.My aim is i have to read from the topic and need to write into HDFS path. 我正在尝试在spark编码中创建kafka消费者,同时创建异常。我的目标是我必须阅读本主题,并需要写入HDFS路径。
scala> df2.printSchema()
root
|-- key: binary (nullable = true)
|-- value: binary (nullable = true)
|-- topic: string (nullable = true)
|-- partition: integer (nullable = true)
|-- offset: long (nullable = true)
|-- timestamp: timestamp (nullable = true)
|-- timestampType: integer (nullable = true)
scala> print(df1)
[key: binary, value: binary ... 5 more fields]
I am not giving any input in the topic even though it's taking these 6 values as input. 即使将这6个值作为输入,我也没有在主题中提供任何输入。
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.types.StringType
import org.apache.spark.sql.types.StructField
import spark.implicits._
object Read {
def main(args: Array[String]): Unit = {
val spark = SparkSession.builder()
.appName("spark Oracle Kafka")
.master("local")
.getOrCreate()
val df2 = spark
.read
.format("kafka")
.option("kafka.bootstrap.servers", "kafka server ip address i have given")
.option("subscribe", "topic20190904")
.load()
print(df1)//it is return some values
df2.show() it's throwing exception i hope it's not dataframe.
df2.write.parquet("/user/xrrn5/abcd")// I am getting java.lang.AbstractMethodError
java.lang.AbstractMethodError at rg.apache.spark.internal.Logging$class.initializeLogIfNecessary(Logging.scala)
To write data from Kafka to HDFS you don't actually need any code—you can just use Kafka Connect, which is part of Apache Kafka. 要将数据从Kafka写入HDFS,实际上不需要任何代码-您只需使用Kafka Connect(它是Apache Kafka的一部分)即可。 Here's an example configuration: 这是一个示例配置:
{
"name": "hdfs-sink",
"config": {
"connector.class": "io.confluent.connect.hdfs.HdfsSinkConnector",
"tasks.max": "1",
"topics": "test_hdfs",
"hdfs.url": "hdfs://localhost:9000",
"flush.size": "3",
"name": "hdfs-sink"
}
}
See here for documentation on the connector, and here for a general introduction and overview of using Kafka Connect. 看到这里的连接器上的文件,并在这里使用卡夫卡连接的总体介绍和概述。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.