简体   繁体   中英

Using scikit-learn in Julia through PyCall

I'm trying to use Scikit-learn in Julia through PyCall .

As a start, I'm trying to read the iris data into a Julia data structure.

This is the code in Python:

from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

iris = datasets.load_iris()

X = iris.data
y = iris.target

The PyCall documentation says Python methods are called in Julia like, for example:

my_dna[:find]("ACT")

as opposed to:

my_dna.find("ACT")

in Python.

My attempt to do import the iris data in Julia is:

using PyCall
@pyimport sklearn.datasets as datasets
@pyimport sklearn.naive_bayes as NB

iris = datasets.load_iris()

X = ...?
Y = ...?

The iris = datasets.load_iris() call works where iris is then a Dict{Any,Any} type.

I'm not sure if this correct. I tried iris = datasets[:load_iris] instead but this results in:

ERROR: LoadError: MethodError: no method matching getindex(::Module, ::Symbol)

Going further, how would I read iris.data and iris.target into X and Y ?

As you say, Julia tells you what type iris is:

julia v0.5> @pyimport sklearn.datasets as datasets

julia v0.5> @pyimport sklearn.naive_bayes as NB

julia v0.5> iris = datasets.load_iris()
Dict{Any,Any} with 5 entries:
  "feature_names" => Any["sepal length (cm)","sepal width (cm)","petal length (…
  "target_names"  => PyObject array(['setosa', 'versicolor', 'virginica'], …
  "data"          => [5.1 3.5 1.4 0.2; 4.9 3.0 1.4 0.2; … ; 6.2 3.4 5.4 2.3; 5.…
  "target"        => [0,0,0,0,0,0,0,0,0,0  …  2,2,2,2,2,2,2,2,2,2]
  "DESCR"         => "Iris Plants Database\n====================\n\nNotes\n----…

It also tells you what the keys in the dictionary are. So now you just use Julia's syntax for accessing values in a dictionary (result snipped):

julia v0.5> X = iris["data"]
150×4 Array{Float64,2}:
 5.1  3.5  1.4  0.2
 4.9  3.0  1.4  0.2
 4.7  3.2  1.3  0.2

julia v0.5> Y = iris["target"]
150-element Array{Int64,1}:
 0
 0

Note that I did not know the answer to this question. I just let Julia guide me as to what to do.

Finally, as @ChrisRackauckas suggested, there is already a Julia package that wraps scikit-learn: https://github.com/cstjean/ScikitLearn.jl

Since there were some changes, I'd like to add the current syntax of of PyCall (currently version 1.91.4) in addition to Davids answer.

The python code

from sklearn import datasets
from sklearn.naive_bayes import GaussianNB

iris = datasets.load_iris()

X = iris.data
y = iris.target

becomes in Julia:

using PyCall
datasets = pyimport("sklearn.datasets")
GaussianNB = pyimport("sklearn.naive_bayes")
iris = datasets.load_iris()
X = iris["data"]
y = iris["target"]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM