简体繁体 English

从数据库查询结果创建 Flink DataStream

[英]Creating a Flink DataStream from database query results

原文 2022-03-17 11:24:52 9 1 java/ amazon-redshift/ apache-flink/ flink-streaming

In my problem I need to query a database and join the query results with a Kafka data stream in Flink.在我的问题中，我需要查询数据库并将查询结果与 Flink 中的 Kafka 数据 stream 连接起来。 Currently this is done by storing the query results in a file and then use Flink's readFile functionality to create a DataStream of query results.目前，这是通过将查询结果存储在文件中，然后使用 Flink 的readFile功能创建查询结果的DataStream来完成的。 What could be a better approach to bypass the intermediary step of writing to file and create a DataStream directly from query results?有什么更好的方法可以绕过写入文件的中间步骤并直接从查询结果创建DataStream ？

My current understanding is that I would need to write a custom SourceFunction as suggested here .我目前的理解是，我需要按照此处的建议编写自定义SourceFunction 。 Is this the right and only way or are there any alternatives?这是正确且唯一的方法还是有其他选择？

Are there any good resources for writing the custom SoruceFunctions or should I just look at current implementations for reference and customise them fro my needs?是否有编写自定义SoruceFunctions的任何好的资源，或者我应该只查看当前的实现以供参考并根据我的需要自定义它们？

1 个解决方案

One straightforward solution would be to use a lookup join , perhaps with caching enabled.一种直接的解决方案是使用lookup join ，也许启用缓存。

Other possible solutions include kafka connect , or using something like Debezium to mirror the database table into Flink.其他可能的解决方案包括kafka connect ，或使用 Debezium 之类的东西将数据库表镜像到 Flink 中。 Here's an example: https://github.com/ververica/flink-sql-CDC .这是一个示例： https://github.com/ververica/flink-sql-CDC 。

从 Athena 查询结果创建 CloudWatch 指标 - Creating a CloudWatch Metrics from the Athena Query results

是否有用于 Python DataStream API Flink 1.13 版本的 Kinesis 连接器？ - Are there any Kinesis Connectors for Python DataStream API Flink 1.13 version?

使用动态分区通过 Lambda 创建 Firehose 数据流时出错 - Error creating a Firehose Datastream through Lambda with Dynamic Partitioning

Google Cloud DataStream to Bigquery 模板无法将数据同步到大查询 - Google Cloud DataStream to Bigquery template not able to sync data to big query

从 stream 中删除回填表的 Google Datastream（测试版）问题 - Google Datastream (beta) issues with removing backfilling table from a stream

从 firebase 实时数据库查询 - query from firebase realtime database

FireStore查询如何从数据库中查询到arrays - FireStore query how to query arrays from the database

Python中根据IF条件从Redshift数据库返回结果 - Return results from Redshift database based on IF condition in Python

cloudwatch 中的 KDA 指标与 Flink 指标不同 - KDA metrics in cloudwatch different from Flink metrics

AWS ElasticSearch 服务和数据流 - AWS ElasticSearch Service and DataStream

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 从 Athena 查询结果创建 CloudWatch 指标 - Creating a CloudWatch Metrics from the Athena Query results 是否有用于 Python DataStream API Flink 1.13 版本的 Kinesis 连接器？ - Are there any Kinesis Connectors for Python DataStream API Flink 1.13 version? 使用动态分区通过 Lambda 创建 Firehose 数据流时出错 - Error creating a Firehose Datastream through Lambda with Dynamic Partitioning Google Cloud DataStream to Bigquery 模板无法将数据同步到大查询 - Google Cloud DataStream to Bigquery template not able to sync data to big query 从 stream 中删除回填表的 Google Datastream（测试版）问题 - Google Datastream (beta) issues with removing backfilling table from a stream 从 firebase 实时数据库查询 - query from firebase realtime database FireStore查询如何从数据库中查询到arrays - FireStore query how to query arrays from the database Python中根据IF条件从Redshift数据库返回结果 - Return results from Redshift database based on IF condition in Python cloudwatch 中的 KDA 指标与 Flink 指标不同 - KDA metrics in cloudwatch different from Flink metrics AWS ElasticSearch 服务和数据流 - AWS ElasticSearch Service and DataStream

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM