如何在Jython UDF for Pig中使用腌制的scikit-learn模型？

Question

我已经从scikit-learn训练了一个MultinomialNB模型，现在我想将其释放到S3集群上的许多json文本文件中。 我腌了模型（称它为“ nb.pickle”）。 如何在Pig脚本中加载并使用它？ 假设我有一个带有文本行的文件，每个文件都需要归类为垃圾邮件或火腿：

    "im bored tonight, come chat with me",
    "hi good looking msg me sometime",
    "I'm walking the dog",
    "check me out",
    "I went to the store earlier",
    "here much at all but im always on there at i get on there alot more, my id is orangewolf77",
    "I like to play baseball",
    "what are you doing?",
    "i had a picture on my profile did u not see it?",
    "look at my b00bs",
    "go to my website http://we.scam.u
    "you are so pretty"

Answer 1

Jython无法使用numpy，scipy和scikit-learn，因为它们都具有Jython不支持的本机编译扩展。 因此，无法在Jython中使用scikit-learn模型，也无法从pickle文件中加载它们。

您可以做的是对MNB类的代码进行内省，以了解要导出的参数（例如，在json文件中），并重写一种新的预测方法，该方法可以根据Jython中的那些固定参数来计算预测。

或者，您可以在hadoop节点上安装CPython，numpy，scipy和scikit-learn（例如，使用Anaconda发行版），并通过hadoop流接口调用scikit-learn。

如何在Jython UDF for Pig中使用腌制的scikit-learn模型？

问题描述

1 个解决方案

解决方案1
1 已采纳 2015-03-27 13:02:25

如何在Jython UDF for Pig中使用腌制的scikit-learn模型？

问题描述

1 个解决方案

解决方案1 1 已采纳 2015-03-27 13:02:25

解决方案1
1 已采纳 2015-03-27 13:02:25