我可以从Apache Spark UDF中返回Tuple2吗（在Java中）？

Question

I need a UDF2 that takes two arguments as input corresponding to two Dataframe columns of types String and mllib.linalg.Vector and return a Tuple2. 我需要一个UDF2，它需要两个参数作为输入，分别对应于String和mllib.linalg.Vector类型的两个Dataframe列，并返回一个Tuple2。 IS this doable? 这可行吗？ if yes, how do I register this udf()? 如果是，我如何注册这个udf（）？

hiveContext.udf().register("getItemData", get_item_data, WHAT GOES HERE FOR RETURN TYPE?);

the udf is defined as follows: udf定义如下：

UDF2<String, org.apache.spark.mllib.linalg.Vector, Tuple2<String, org.apache.spark.mllib.linalg.Vector>> get_item_data =
            (String id, org.apache.spark.mllib.linalg.Vector features) -> {
        return new Tuple2<>(id, features);
    };

Answer 1

There goes a schema which can be defined as follows: 有这样一种schema ，可以被定义如下：

import org.apache.spark.sql.types.DataType;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.mllib.linalg.VectorUDT;

List<StructField> fields = new ArrayList<>();
fields.add(DataTypes.createStructField("id", DataTypes.StringType, false));
fields.add(DataTypes.createStructField("features", new VectorUDT(), false));
DataType schema = DataTypes.createStructType(fields);

but if all you need is just a struct without any additional processing org.apache.spark.sql.functions.struct should do the trick: 但是，如果您只需要一个没有任何额外处理的struct ，则org.apache.spark.sql.functions.struct应该可以解决问题：

df.select(struct(col("id"), col("features"));

我可以从Apache Spark UDF中返回Tuple2吗（在Java中）？

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-01-09 21:37:58

我可以从Apache Spark UDF中返回Tuple2吗（在Java中）？

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-01-09 21:37:58

解决方案1
1 已采纳 2017-01-09 21:37:58