简体繁体中英

How to create an udf for hive using python with 3rd party package like sklearn?

原文 2017-03-21 12:26:38 8 1 python/ hive/ package/ udf

I know how to create a hive udf with transform and using , but I can't use sklearn because not all the node in hive cluster has sklearn .
I have an anaconda2.tar.gz with sklearn , What should I do ?

1 answers

I recently started looking into this approach and I feel like the problem is not about to get all the 'hive nodes' having sklearn on them (as you mentioned above), I feel like it is rather a compatibility issue than 'sklearn node availability' one. I think sklearn is not (yet) designed to run as a parallel algorithm such that large amount of data can be processed in a short time.

What I'm trying to do, as an approach, is to communicate python to 'hive' through 'pyhive' (for example) and implement the necessary sklearn libraries/calls within that code. The rough assumption here that this 'sklearn-hive-python' code will run in each node and deal with the data at the 'map-reduce' level. I cannot say this is the right solution or correct approach (yet) but this is what I can conclude after searching for sometime.

Installation of 3rd party package in Python

Using 3rd Party Libraries in Python

How to install a 3rd party module in Python?

jython can't find 3rd party python package (python installed using miniconda)

Python override 3rd party package single file

How can I include a 3rd party package when I build in Python?

How can I tell if a package/module is part of Python's std library? without a 3rd party library

Is there a 3rd party compiler for python?

How to create a UDF in HIVE using python for a Timestamp transformation

how to protect request information from 3rd party android apps using python

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Installation of 3rd party package in Python Using 3rd Party Libraries in Python How to install a 3rd party module in Python? jython can't find 3rd party python package (python installed using miniconda) Python override 3rd party package single file How can I include a 3rd party package when I build in Python? How can I tell if a package/module is part of Python's std library? without a 3rd party library Is there a 3rd party compiler for python? How to create a UDF in HIVE using python for a Timestamp transformation how to protect request information from 3rd party android apps using python

Related Tags

How to create an udf for hive using python with 3rd party package like sklearn?

Question

1 answers

solution1 0 2017-04-08 03:46:20

solution1
0 2017-04-08 03:46:20