[英]How to register Scala UDF in spark-SQL, not Spark-Scala?
It looks like a regular Hive statement should work. 看起来常规的Hive语句应该起作用。 In my script.sql which I run through
spark-sql --jars mylib.jar myscript.sql
在我通过
spark-sql --jars mylib.jar myscript.sql
运行的script.sql中spark-sql --jars mylib.jar myscript.sql
CREATE TEMPORARY FUNCTION rank AS 'com.mycompany.udf.Custom.rankFunc';
...
CREATE TEMPORARY VIEW MyTable AS (
SELECT
rank(id) AS rank,
...
In Scala code (mylib.jar): 在Scala代码(mylib.jar)中:
package com.mycompany.udf
...
object Custom {
def rankFunc(id: Long): Double = { Rank(id).rank }
....
}
However, Hive code does not see this function. 但是,Hive代码看不到此功能。
18/01/23 17:38:25 ERROR SparkSQLDriver: Failed in [
CREATE TEMPORARY FUNCTION rank AS 'com.mycompany.udf.Custom.rankFunc']
java.lang.ClassNotFoundException: com.mycompany.udf.Custom.rankFunc
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
How should I change the code in my Scala library? 如何更改Scala库中的代码?
You're getting this error because Hive expects a function to be a class , not a method name. 您收到此错误是因为Hive希望函数是类 ,而不是方法名。
Change your Scala code (UDF) to: 将您的Scala代码(UDF)更改为:
package com.mycompany.udf
class RankFunc extends org.apache.hadoop.hive.ql.exec.UDF {
def evaluate(id: Long): Double = { Rank(id).rank }
}
... and SQL script to: ...和SQL脚本可以:
CREATE TEMPORARY FUNCTION rankFunc AS 'com.mycompany.udf.RankFunc'
...
Here are examples of how to create a custom UDF with Java and Scala . 以下是有关如何使用Java和Scala创建自定义UDF的示例。
Because there is lots of confusion I am updating my answer: 因为有很多困惑,所以我正在更新我的答案:
here is the code for md5 jave code: 这是md5 jave代码的代码:
package org.apache.hadoop.hive.ql.udf;
import org.apache.commons.codec.digest.DigestUtils;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDF;
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;
public class UDFMd5 extends UDF {
private final Text result = new Text();
/**
* Convert String to md5
*/
public Text evaluate(Text n) {
if (n == null) {
return null;
}
String str = n.toString();
String md5Hex = DigestUtils.md5Hex(str);
result.set(md5Hex);
return result;
}
}
I have taken the same jar used in Hive and was able to make it work : 我已经使用了Hive中使用的同一罐子,并且能够使其工作:
AND This worked for me : 并且这为我工作:
In hive i used : 在蜂巢中,我使用了:
create temporary function md5 AS 'org.apache.hadoop.hive.ql.udf.UDFMd5' USING JAR '/test/balaram/hive-MD5.jar;
使用JAR'/test/balaram/hive-MD5.jar创建临时功能md5 AS'org.apache.hadoop.hive.ql.udf.UDFMd5';
In Spark I used : 在Spark中,我使用了:
create temporary function md5 AS 'org.apache.hadoop.hive.ql.udf.UDFMd5'
创建临时功能md5 AS'org.apache.hadoop.hive.ql.udf.UDFMd5'
If This doesn't help, I am sorry 如果这没有帮助,对不起
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.