[英]Using grok in flink streaming
Flink Pipeline is as follows: Flink Pipeline如下:
Below is the code for pattern matching using grok. 下面是使用grok进行模式匹配的代码。
SingleOutputStreamOperator<JSONObject> mainStream = messageStream.rebalance()
.map(new MapFunction<String, JSONObject>() {
private static final long serialVersionUID = 6;
@Override
public JSONObject map(String value) throws Exception {
JSONObject logJson = new JSONObject();
grok.compile(pattern); //pattern is some pattern defined in the class
Match gm = grok.match(value);
gm.captures();
logJson.putAll(gm.toMap());
return logJson;
}})
In the above code writing grok.compile(pattern)
inside the map function works fine. 在上面的代码中,在map函数里面编写
grok.compile(pattern)
工作正常。 Not doing so gives the following error 不这样做会产生以下错误
The implementation of the MapFunction is not serializable
MapFunction的实现不可序列化
Caused by: java.io.NotSerializableException: com.google.code.regexp.Pattern
引起:java.io.NotSerializableException:com.google.code.regexp.Pattern
Is there any way in which I could remove the grok.compile outside the map. 有什么方法可以删除地图外的grok.compile 。 As per my understanding the compilation of the pattern with every message is not required and might create a bottleneck if the no.
根据我的理解,不需要使用每条消息编译模式,如果不是,可能会产生瓶颈。 of messages becomes quite large.
消息变得非常大。
PS: I have imported the package oi.thekraken.grok.api.Grok PS:我已经导入了包oi.thekraken.grok.api.Grok
EDIT: 编辑:
I looked through grok implementation and the Grok class implements Serializable. 我查看了grok实现,Grok类实现了Serializable。 https://github.com/thekrakken/java-grok/blob/master/src/main/java/io/thekraken/grok/api/Grok.java
https://github.com/thekrakken/java-grok/blob/master/src/main/java/io/thekraken/grok/api/Grok.java
Your code does not show where the local variable grok comes from, but: 您的代码不显示局部变量grok的来源,但是:
Flink requires all operators to be Serializable because they might be moved around in a cluster. Flink要求所有运算符都是可序列化的,因为它们可能在集群中移动。 This also holds true for all members of operators.
这也适用于所有运营商。 Can you post a complete non-working example?
你能发布一个完整的非工作示例吗? This might make it easier to see where serialization might fail.
这可能会更容易查看序列化可能失败的位置。
More information about flink serialization can be ound in the flink documentation at https://flink.apache.org/faq.html#why-am-i-getting-a-nonserializableexception- and https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/types_serialization.html 有关flink序列化的更多信息,请参阅https://flink.apache.org/faq.html#why-am-i-getting-a-nonserializableexception-和https://ci.apache.org/上的flink文档。 项目/弗林克/弗林克-docs的释放-1.2的/ dev / types_serialization.html
Basically, you can register a kryo serializer for custom types or implement (de-)serialization yourself if you need operator members that are not directly serializable. 基本上,您可以为自定义类型注册kryo序列化程序,或者如果您需要不可直接序列化的运算符成员,则可以自行实现(反)序列化。
Btw.: I think you are right in trying to reduce the number of times the pattern is compiled 顺便说一句:我认为你试图减少模式编译的次数是正确的
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.