简体   繁体   English

运行时的 Apache Flink 映射

[英]Apache Flink Mapping at Runtime

i have build a flink streaming job to read a xml file from kafka convert the file and write it in a database.我已经构建了一个 flink 流作业来从 kafka 读取 xml 文件转换文件并将其写入数据库。 As the attributes in the xml file don't match the database column names i have build a switch case for the mapping.由于 xml 文件中的属性与数据库列名称不匹配,因此我为映射构建了一个 switch case。

As this is not really flexible i want to take this hardwired mapping information out of the code.由于这不是很灵活,我想从代码中删除这个硬连线的映射信息。 First of all i came up with the idea of a mapping file which could look like this:首先,我想出了一个映射文件的想法,它可能如下所示:

path.in.xml.to.attribut=database.column.name

The current job logic looks like this:当前的作业逻辑如下所示:

switch(path.in.xml.to.attribute){
    case "example.one.name":
        return "name";

With the mapping file i guess i would work with an Map to store the mapping data as a Key-Value-Pair.使用映射文件,我想我会使用 Map 将映射数据存储为键值对。

This would make the job more flexible as it is right now.这将使工作更加灵活,就像现在一样。 Still a downside would be that for every change in this configuration i want to apply i would have to restart the flink job.仍然有一个缺点是,对于我想要应用的此配置中的每个更改,我都必须重新启动 flink 作业。

My question is if it is possible to inject this kind of mapping logic at runtime, for example via an own kafka topic.我的问题是是否可以在运行时注入这种映射逻辑,例如通过自己的 kafka 主题。 And when this kind of implementation is possible how could it look like as an example.当这种实现成为可能时,它会如何作为示例。

If the only you need is to be able to update the mapping between the xml attributes and database column names, then the The Broadcast State Pattern can be used.如果您唯一需要的是能够更新 xml 属性和数据库列名称之间的映射,那么可以使用广播状态模式 Also, A Practical Guide to Broadcast State in Apache Flink is usefull as well.此外, Apache Flink 中的广播状态实用指南也很有用。

The idea is to have a stream, subscribed to your own kafka topic with database mappings which broadcasts the updates to all task managers.这个想法是有一个流,使用数据库映射订阅您自己的 kafka 主题,将更新广播到所有任务管理器。 These operators will maintain this Map<String, String> as a state and you can use this mapping state to resolve the column name, ie instead of switch(path.in.xml.to.attribute) use map.get(path.in.xml.to.attribute)) .这些操作符会将此Map<String, String>维护为一个状态,您可以使用此映射状态来解析列名,即代替switch(path.in.xml.to.attribute)使用map.get(path.in.xml.to.attribute)) The map operator in this case should be replaced with BroadcastProcessFunction .在这种情况下, map运算符应替换为BroadcastProcessFunction

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM