简体   繁体   English

使用火花流从数据库中读取流

[英]Stream reading from database using spark streaming

I want to use spark streaming to read data from RDBMS database like mysql. 我想使用spark streaming来读取来自RDBMS数据库的数据,比如mysql。

but I don't know how to do this using JavaStreamingContext 但我不知道如何使用JavaStreamingContext来做到这一点

 JavaStreamingContext jssc = new JavaStreamingContext(conf, Durations.milliseconds(500));
DataFrame df = jssc. ??

I search in the internet but I didn't find anything 我在互联网上搜索,但我没有找到任何东西

thank you in advance. 先感谢您。

You cannot do it like that without installing some third party piece of software. 如果不安装某些第三方软件,你就无法做到这一点。
What you CAN do is creating a personalized receiver which does what you want, using the SparkSQL package and the Streaming one combined. 你可以做的是创建一个个性化的接收器,它可以完成你想要的,使用SparkSQL包和Streaming组合。
Implement a class extending Receiver and inside do all the connections and querys needed to pull the data from the DB. 实现扩展Receiver的类,并在内部执行从数据库中提取数据所需的所有连接和查询。
I am at work now, so I'll give you a link to see instead of producing the code, sorry: 我现在在工作,所以我会给你一个链接,看看而不是产生代码,抱歉:
http://spark.apache.org/docs/latest/streaming-custom-receivers.html http://spark.apache.org/docs/latest/streaming-custom-receivers.html
https://medium.com/@anicolaspp/spark-custom-streaming-sources-e7d52da72e80 https://medium.com/@anicolaspp/spark-custom-streaming-sources-e7d52da72e80

The best possible and reliable solution would be avoid using MySqL at all. 最好的可靠解决方案是避免使用MySqL。 when you insert your records to MySQl put them also into Kafka (Kafka producer) by a transaction and then use them in your streaming application. 当您将记录插入MySQl时,通过事务将它们也放入Kafka(Kafka生产者),然后在流应用程序中使用它们。

It's not possible to stream from MySql I think. 我认为不可能从MySql流式传输。 Data can be ingested from many sources like Kafka, Flume, Twitter, ZeroMQ, Kinesis, or TCP sockets. 数据可以从许多来源摄取,如Kafka,Flume,Twitter,ZeroMQ,Kinesis或TCP套接字。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 是否可以使用火花流 stream 数据库表数据 - is it possible to stream a database table data using spark streaming 在Spark Streaming中使用Java对有序的Spark流进行迭代编程? - Iterative programming on an ordered spark stream using Java in Spark Streaming? 使用火花流从 kafka 读取数据时出现 lz4 异常 - lz4 exception when reading data from kafka using spark streaming 在Spark上使用Twitter Streaming API无法获得Tweets流 - Can't get a stream of Tweets using Twitter Streaming API on Spark 如何使用Spark结构化流为Kafka流实现自定义反序列化器? - How to implement custom deserializer for Kafka stream using Spark structured streaming? 如何使用直接流在Kafka Spark Streaming中指定使用者组 - how to specify consumer group in Kafka Spark Streaming using direct stream 尝试使用Spark Streaming连接Cassandra数据库时出错 - Error while trying to connect cassandra database using spark streaming 在Spark Streaming中使用Spark SQL - Using Spark SQL with Spark Streaming 从磁盘读取Spark流式传输错误-java.io.NotSerializableException:org.apache.spark.streaming.api.java.JavaStreamingContext - Spark streaming error reading from disk - java.io.NotSerializableException: org.apache.spark.streaming.api.java.JavaStreamingContext Kafka和TextSocket Stream中的Spark Streaming数据传播 - Spark Streaming data dissemination in Kafka and TextSocket Stream
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM