简体   繁体   English

使用Spark将数据配置为Kafka主题

[英]Hive data to Kafka topic using Spark

I am trying to write data in Hive table to Kafka topic using Spark. 我正在尝试使用Spark将Hive表中的数据写入Kafka主题。

I am working on writing a data frame of 9 million records (per day) to a Kafka topic using the query: 我正在使用查询将900万条记录(每天)的数据帧(每天)写入Kafka主题:

val ds=df.selectExpr("topic", "CAST(key AS STRING)", "CAST(value AS STRING)")
.write.format("kafka").option("kafka.bootstrap.servers", "host1:port1,host2:port2").start()

Can this query have the capability to write that huge amount of data to the kafka topic? 此查询是否可以将大量数据写入kafka主题?

If yes, how much time it could take to complete writing the data? 如果是,完成数据写入需要花费多少时间?

If not, what are the other possible ways to do it? 如果没有,还有其他可能的方法吗?

You can use batch processing if the task is to do the above mentioned operation daily and not in real-time. 如果任务是每天而不是实时进行上述操作,则可以使用批处理。

9 million records can be handled easily with this. 这样就可以轻松处理900万条记录。

The time required to do this depends on the cluster configuration and also on the intermediate processing required. 所需的时间取决于群集配置以及所需的中间处理。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM