简体   繁体   中英

Using grok in flink streaming

Flink Pipeline is as follows:

  1. read messages(string) from kafka topic.
  2. pattern matching through grok converting to json format.
  3. Aggregations over a time window over extracted field from json.

Below is the code for pattern matching using grok.

    SingleOutputStreamOperator<JSONObject> mainStream = messageStream.rebalance()
                    .map(new MapFunction<String, JSONObject>() {    
                        private static final long serialVersionUID = 6;

                        @Override
                        public JSONObject map(String value) throws Exception {
                            JSONObject logJson = new JSONObject();  
                            grok.compile(pattern); //pattern is some pattern defined in the class
                            Match gm = grok.match(value);
                            gm.captures();
                            logJson.putAll(gm.toMap());
                            return logJson;
                        }})

In the above code writing grok.compile(pattern) inside the map function works fine. Not doing so gives the following error

The implementation of the MapFunction is not serializable

Caused by: java.io.NotSerializableException: com.google.code.regexp.Pattern

Is there any way in which I could remove the grok.compile outside the map. As per my understanding the compilation of the pattern with every message is not required and might create a bottleneck if the no. of messages becomes quite large.

PS: I have imported the package oi.thekraken.grok.api.Grok

EDIT:

I looked through grok implementation and the Grok class implements Serializable. https://github.com/thekrakken/java-grok/blob/master/src/main/java/io/thekraken/grok/api/Grok.java

Your code does not show where the local variable grok comes from, but:

Flink requires all operators to be Serializable because they might be moved around in a cluster. This also holds true for all members of operators. Can you post a complete non-working example? This might make it easier to see where serialization might fail.

More information about flink serialization can be ound in the flink documentation at https://flink.apache.org/faq.html#why-am-i-getting-a-nonserializableexception- and https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/types_serialization.html

Basically, you can register a kryo serializer for custom types or implement (de-)serialization yourself if you need operator members that are not directly serializable.

Btw.: I think you are right in trying to reduce the number of times the pattern is compiled

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM