简体   繁体   English

Flink:从行字段中提取数组

[英]Flink : Extract array from a Row Field

I am using Flink, and I am using a custom function in a map. 我正在使用Flink,并且在地图中使用了自定义函数。 This custom function use the Flink Row type as input and output a Map of (String, Object) that contains each field and values of my row. 此自定义函数使用Flink Row类型作为输入,并输出一个包含(String,Object)的Map,该Map包含我的行的每个字段和值。

In basic case this function work well, but now I need to do some processing on a specific field, which is an array of integer. 在基本情况下,此功能可以很好地工作,但是现在我需要对特定字段(整数数组)进行一些处理。 In this case, I have some trouble extracting the data in my Row to a Java Collection object (list or array or whatever). 在这种情况下,我在将Row中的数据提取到Java Collection对象(列表或数组等)时遇到了一些麻烦。

Here is the code of my CustomMap : 这是我的CustomMap的代码:

public class CustomMap implements MapFunction<Row, Map<String, Object>> {

    private final String arrayField = "ArrayField";
    private final String[] fields = {"genTimestampMs", "some_field", "timestampMs", "some_field_2", "ArrayField"};

    public CustomMap(){}

    @Override
    public Map<String, Object> map(Row myRow) throws Exception {
        LOGGER.debug("Mapping the row "+myRow.toString());

        final Map<String, Object> m = new HashMap<>();

        for (int i = 0; i < myRow.getArity(); i++) {
            LOGGER.debug("  Field "+i);
            if (arrayField.equals(fields[i])) {
                LOGGER.debug("Is the field  "+arrayField);
                Integer wCount = 0;

                LOGGER.debug("  row0 : "+myRow.getField(i));
                Row test = Row.of(myRow.getField(i));
                LOGGER.debug("  row : "+test);
                LOGGER.debug("  getArity: "+test.getArity());

                List<Integer> myList = (List<Integer>)myRow.getField(i); // <--- Error here

                String value = // Do something with my list

                m.put(fields[i], value);

            } else {
                LOGGER.debug("  Put field in map : ("+fields[i]+" -> "+myRow.getField(i)+")");
                m.put(fields[i], myRow.getField(i));
            }
        }
        return m;
    }
}

Here is an example of a json I use as input data (it is parsed with Flink JsonRowDeserializationSchema ) : 这是我用作输入数据的json示例(已使用Flink JsonRowDeserializationSchema进行解析):

{"genTimestampMs": 1561130625000, "some_field": "some_value", "timestampMs": 1561130625000, "some_field_2":"some_value", "ArrayField": [1,2,3,4,5]}

And here are the logs of my code execution on this data : 这是我在此数据上执行代码的日志:

2019-06-27 13:40:02.854 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  - Mapping the row 1561130625000,some_value,1561130625000,some_value,[1, 2, 3, 4, 5]
2019-06-27 13:40:02.854 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   Field 0
2019-06-27 13:40:02.858 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   Put field in map : (genTimestampMs -> 1561130625000)
2019-06-27 13:40:02.858 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   Field 1
2019-06-27 13:40:02.858 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   Put field in map : (some_field -> some_value)
2019-06-27 13:40:02.858 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   Field 2
2019-06-27 13:40:02.858 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   Put field in map : (timestampMs -> 1561130625000)
2019-06-27 13:40:02.858 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   Field 3
2019-06-27 13:40:02.858 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   Put field in map : (some_field_2 -> some_value)
2019-06-27 13:40:02.858 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   Field 4
2019-06-27 13:40:02.858 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   Is the field  ArrayField
2019-06-27 13:40:02.858 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   row0 : [Ljava.lang.Integer;@68374747
2019-06-27 13:40:02.859 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   row : [1, 2, 3, 4, 5]
2019-06-27 13:40:02.859 [Source: Custom Source -> Map -> Map -> Sink: Unnamed (5/12)] DEBUG CustomMap  -   getArity: 1 
java.lang.ClassCastException: class [Ljava.lang.Integer; cannot be cast to class java.util.List ([Ljava.lang.Integer; and java.util.List are in module java.base of loader 'bootstrap')

Note : 注意 :

  • Trying to parse the variable test does not work either : java.lang.ClassCastException: class org.apache.flink.types.Row cannot be cast to class java.util.List (org.apache.flink.types.Row is in unnamed module of loader 'app'; java.util.List is in module java.base of loader 'bootstrap') 尝试解析变量test也不起作用: java.lang.ClassCastException: class org.apache.flink.types.Row cannot be cast to class java.util.List (org.apache.flink.types.Row is in unnamed module of loader 'app'; java.util.List is in module java.base of loader 'bootstrap')

The following cast is wrong: 以下强制转换是错误的:

List<Integer> myList = (List<Integer>)myRow.getField(i); // <--- Error here

This is a cast of myRow to List<Integer> , then an attempt to call getField() on that List . 这是将myRowList<Integer> ,然后尝试在该List上调用getField()

Try: 尝试:

List<Integer> myList = (List<Integer>)(myRow.getField(i));

which is to say, perform getField() on myRow (which will return Object ), then cast that Object to List<Integer> . 也就是说,在myRow (将返回Object getField()上执行getField() ,然后将该ObjectList<Integer>

EDIT: I think the issue is that the object is an array of Integers, not a List. 编辑:我认为问题是对象是整数数组,而不是列表。 Try the following: 请尝试以下操作:

List<Integer> myList = Arrays.asList((Integer[])(myRow.getField(i)));

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM