简体   繁体   English

使用 Hive 表的 Hive UDF

[英]Hive UDF that uses Hive table

I have developed a hive udf in java that works correctly, my function return the best match between the input and a column in a hive table, so it have this simplified pseudo-code:我在 java 中开发了一个可以正常工作的 hive udf,我的函数返回输入和 hive 表中列之间的最佳匹配,所以它有这个简化的伪代码:

class myudf  extends udf{

    evaluate(Text input){

        getNewHiveConnection(); //i want to replace this by getCurrentHiveUserConnetion();
        executeHiveQuery(input);
        return something;
}

My question is, if this function is invoked by Hive, why i need to connect to hive in my code ?我的问题是,如果这个函数是由 Hive 调用的,为什么我需要在我的代码中连接到 Hive? can i use the current connection that the user who use my function is connected by ?我可以使用使用我的功能的用户所连接的当前连接吗?

If you want to return a closest match from entire column in a query, you could think if it as some sort of aggregation and use Hive UDAF: https://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy如果您想从查询中的整个列返回最接近的匹配项,您可以将其视为某种聚合并使用 Hive UDAF: https ://cwiki.apache.org/confluence/display/Hive/GenericUDAFCaseStudy

There's also quite handy tutorial: http://beekeeperdata.com/posts/hadoop/2015/08/17/hive-udaf-tutorial.html还有非常方便的教程: http : //beekeeperdata.com/posts/hadoop/2015/08/17/hive-udaf-tutorial.html

Another way will be to create Macros.另一种方法是创建宏。 They work both on Hive and Beeline.他们在 Hive 和 Beeline 上都工作。

CREATE TEMPORARY MACRO fn_maskNull(input decimal(25,3))
CASE
    WHEN input IS NULL THEN 0 else input
END;

-- usage
select fn_maskNull(null), fn_maskNull(101);

More info :更多信息 :

https://medium.com/@gchandra/create-user-defined-functions-in-hive-beeline-ff965285d735 https://medium.com/@gchandra/create-user-defined-functions-in-hive-beeline-ff965285d735

Yes - you can make the UDF permanent.是的 - 您可以使 UDF 永久化。 For example:例如:

CREATE FUNCTION MatchFinder as 'com.mycompany.packagex.myudf' using jar  'hdfs:///an_HDFS_directory/my_jar_name.jar';

This will make your function permanent and anyone will be able to call it.这将使您的功能永久化,任何人都可以调用它。 In this case, the jar file is stored on HDFS for easy accessibility, but there are other options.在这种情况下,jar 文件存储在 HDFS 上以方便访问,但还有其他选项。

See Hive wiki for more details.有关更多详细信息,请参阅Hive wiki

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM