Understanding DictVectorizer in scikit-learn?

Question

I'm exploring the different feature extraction classes that scikit-learn provides. Reading the documentation I did not understand very well what DictVectorizer can be used for? Other questions come to mind. For example, how can DictVectorizer be used for text classification?, ie how does this class help handle labelled textual data? Could anybody provide a short example apart from the example that I already read at the documentation web page?

Answer 1

say your feature space is length , width and height and you have had 3 observations; ie you measure length, width & height of 3 objects:

       length  width  height
obs.1       1      0       2
obs.2       0      1       1
obs.3       3      2       1

another way to show this is to use a list of dictionaries:

[{'height': 1, 'length': 0, 'width': 1},   # obs.2
 {'height': 2, 'length': 1, 'width': 0},   # obs.1
 {'height': 1, 'length': 3, 'width': 2}]   # obs.3

DictVectorizer goes the other way around; ie given the list of dictionaries builds the top frame:

>>> from sklearn.feature_extraction import DictVectorizer
>>> v = DictVectorizer(sparse=False)
>>> d = [{'height': 1, 'length': 0, 'width': 1},
...      {'height': 2, 'length': 1, 'width': 0},
...      {'height': 1, 'length': 3, 'width': 2}]
>>> v.fit_transform(d)
array([[ 1.,  0.,  1.],   # obs.2
       [ 2.,  1.,  0.],   # obs.1
       [ 1.,  3.,  2.]])  # obs.3
   # height, len., width

Understanding DictVectorizer in scikit-learn?

Question

1 answers

solution1
18 2014-12-15 01:43:09

Understanding DictVectorizer in scikit-learn?

Question

1 answers

solution1 18 2014-12-15 01:43:09

solution1
18 2014-12-15 01:43:09