简体   繁体   English

蜂巢中的python udf

[英]python udf in hive

I want to write hive udf in python to parse name column (using https://pypi.python.org/pypi/nameparser ) and put the parsed values in different columns of table(title,first,middle,last,suffix,nickname). 我想在python中编写配置单元udf来解析名称列(使用https://pypi.python.org/pypi/nameparser )并将解析后的值放在表的不同列中(标题,第一,中间,最后,后缀,昵称) )。

I am new to python ,I wrote python code like this 我是python的新手,我这样写了python代码

#!/usr/bin/python
import sys
from nameparser import HumanName
name = HumanName(name)
return name.title

And in hive am doing like this 而在蜂巢中我就是这样

add file title.py;
SELECT TRANSFORM (name) using 'title.py' AS (title STRING) from emp2;

but am getting org.apache.hadoop.hive.ql.metadata.HiveException. 但正在获取org.apache.hadoop.hive.ql.metadata.HiveException。

In the select statement using clause, you need to specify ' python title.py' instead of title.py. 在select语句using子句中,您需要指定“ python title.py”而不是title.py。

add file title.py;
SELECT TRANSFORM (name) using 'python title.py' AS (title STRING) from emp2;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM