简体   繁体   English

Hive UDF不返回预期结果

[英]Hive UDF does not return expected result

I have written an UDF that can be called by Hive (Query Language) that takes 2 parameters and has the following logic: 我编写了一个UDF,它可以由Hive(查询语言)调用,它带有2个参数,并且具有以下逻辑:

returns null if both arguments are null returns the non null value if one argument is null returns the greater of two values if both passed in arguments are not null 如果两个参数都为null,则返回null;如果一个参数为null,则返回非null值;如果两个传入的参数都不为null,则返回两个值中的较大者

I have written the code, compiled the class, and successfully registered the JAR with Hive. 我已经编写了代码,编译了类,并在Hive中成功注册了JAR。 I verified I can see the function in HIVE after creating the temporary function. 我确认可以在创建临时功能后在HIVE中看到该功能。 The problem I am having is that when I call it from a select, it just returns '_c0' rather than a the expected value: 我遇到的问题是,当我从选择中调用它时,它仅返回“ _c0”而不是预期值:

Here is the java class definition. 这是java类的定义。

package com.ispace.hive.udf;

import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.hive.ql.exec.Description;
import java.util.*;
/*
*
* Compilation on Local box is very environment specific but for the iMac in 2013, this command will compile the class:
* javac -target 1.6 -cp $(ls /usr/local/Cellar/hive/0.12.0/libexec/lib/hive-exec*.jar):/usr/local/Cellar/hadoop/1.2.1/libexec/lib/hadoop-core.jar com/ispace/hive/udf/GreaterOf.java
* 
* The above step creates a single .class file that needs to be bundled into a JAR (java archive file)
* To bundle a file or multiple files into a jar, you can run this:
*       jar cvf udfcomparer.jar ./com/ispace/hive/udf/GreaterOf.class ./com/ispace/hive/udf/LesserOf.class
*
* To call a UDF, you must add the JAR to your hive session and then create a 'temporary'function as follows:
*
* hive (default)> ADD JAR /Users/calvinimac/Documents/Safezone/Projects/prospect-visual/etl/scripts/ec2-emr/jars/udfcomparer.jar;            
* hive (default)> create temporary function inlinemax as 'com.ispace.hive.udf.GreaterOf';
*/

@Description(name = "GreaterOf",
             value = "_FUNC_(Integer s, Integer t) - returns the greater value of the two.\n"+
                        "If both values are null, it will return NULL.\n"+
                        "If one value is non null, it will return that value if the other is NULL.",
             extended = "Example:\n"
                    + " > SELECT _FUNC_(column1, column2) FROM src;")

public final class GreaterOf extends UDF {
  public Integer evaluate(final Integer s, final Integer t) {
    Integer result = null;

    if (s == null && t == null) { 
        result = null; 
    } else if (s == null) {
        result = t;
    } else if (t == null) {
        result = s;
    } else if (s >= t) {
        result = s;
    } else {
        result = null;   
    }

    return result;
  }
}

In Hive, I create a placeholder table (unused) create table unused(id bigint) Then I run this select: select inlinemax(2,4) from unused 在Hive中,我创建一个占位符表(未使用),创建表未使用(id bigint),然后运行此选择:从未使用中选择inlinemax(2,4)

I was expecting to get a result of 4 but instead I get 'c0'. 我原本希望得到4的结果,但我却得到'c0'。

Is my UDF wrong and will it handle Hive null values as arguments and correctly map them into my Integer method parameters? 我的UDF是否出错,它将Hive空值作为参数处理并将其正确映射到我的Integer方法参数中吗?

Does unused have any rows in it ??? 未使用的物品中是否有任何行? It looks like "_c0" is the derived column name that Hive produces. 看起来“ _c0”是Hive生成的派生列名称。 To get any rows, you need at least one row in your querying table. 要获取任何行,您的查询表中至少需要一行。

正如Jerome所指出的,只要表(尽管是任意的)具有至少一行数据,Java UDF的确会返回预期结果。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM