[英]Creating a Flink DataStream from database query results
In my problem I need to query a database and join the query results with a Kafka data stream in Flink.在我的问题中,我需要查询数据库并将查询结果与 Flink 中的 Kafka 数据 stream 连接起来。 Currently this is done by storing the query results in a file and then use Flink's readFile
functionality to create a DataStream
of query results.目前,这是通过将查询结果存储在文件中,然后使用 Flink 的readFile
功能创建查询结果的DataStream
来完成的。 What could be a better approach to bypass the intermediary step of writing to file and create a DataStream
directly from query results?有什么更好的方法可以绕过写入文件的中间步骤并直接从查询结果创建DataStream
?
My current understanding is that I would need to write a custom SourceFunction
as suggested here .我目前的理解是,我需要按照此处的建议编写自定义SourceFunction
。 Is this the right and only way or are there any alternatives?这是正确且唯一的方法还是有其他选择?
Are there any good resources for writing the custom SoruceFunctions
or should I just look at current implementations for reference and customise them fro my needs?是否有编写自定义SoruceFunctions
的任何好的资源,或者我应该只查看当前的实现以供参考并根据我的需要自定义它们?
One straightforward solution would be to use a lookup join , perhaps with caching enabled.一种直接的解决方案是使用lookup join ,也许启用缓存。
Other possible solutions include kafka connect , or using something like Debezium to mirror the database table into Flink.其他可能的解决方案包括kafka connect ,或使用 Debezium 之类的东西将数据库表镜像到 Flink 中。 Here's an example: https://github.com/ververica/flink-sql-CDC .这是一个示例: https://github.com/ververica/flink-sql-CDC 。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.