Scikit-learn DictVectorizer to Classifier

Question

I am trying to load a dictionary, and then perform classification. However, I get the error:

  File "train_classifier.py", line 49, in <module>
    clf.fit(page_vecs.data[:-1],page_vecs.target[:-1])
  File "/usr/local/lib/python3.4/site-packages/scipy/sparse/base.py", line 505, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: target not found

How can I load the targets? Here is my code:

vec = DictVectorizer()
page_vecs = vec.fit_transform(feature_dict_list)
clf = svm.SVC(gamma=0.001, C=100)
clf.fit(page_vecs.data[:-1],page_vecs.target[:-1])
print(clf.predict(page_vecs[-1]))

Answer 1

Look at the DictVectorizer class, specifically its fit_transform method:

Returns:
Xa : {array, sparse matrix}

Feature vectors; always 2-d.

So it returns a 2d array.

In your code, this line:

page_vecs = vec.fit_transform(feature_dict_list)

Will cause page_vecs to be such a 2d array. 2d numpy arrays have no target attribute, which you try to use here:

clf.fit(page_vecs.data[:-1],page_vecs.target[:-1])

That is why you get the error. In fact, you shouldn't even do .data , you should work with the numpy array directly. If you want to ignore the last row, do:

page_vecs[:-1, :]

Your labels (or targets) have nothing to do with the DictVectorizer class, which only vectorizes your samples, not your labels. You should have a separate vector for the labels.

Scikit-learn DictVectorizer to Classifier

Question

1 answers

solution1
1 ACCPTED 2015-03-08 10:05:47

Scikit-learn DictVectorizer to Classifier

Question

1 answers

solution1 1 ACCPTED 2015-03-08 10:05:47

solution1
1 ACCPTED 2015-03-08 10:05:47