Data vectorization (get_dummies 3 columns to matrix)

Question

I have a task, which must be optimally solved. I have 50 categories and 10,000 stores that can have products from these categories, but all this in 3 columns

id_store  category    qnty
    1         1        50
    1         2        32
    1         15       44
    2         1        333
    2         4        33
    2         5        15
    2         15       12
    2         35       14
    3         3        14
    ....

It is necessary to make a matrix out of this, where row - id_store, and columns - category, and their intersection - qnty:

id_shop/category  1   2   3   4 ...15  16... 35   36
   1              50  32  0   0    44  0     0    0
   2              333 0   0   33   12  0     14   0
   3              0   0   14  0    0   0     0    0

Answer 1

You could use pandas , which is a library specifically designed for dataframes like yours. From the pandas documentation , I found this example:

>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
...                            'two'],
...                    'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
...                    'baz': [1, 2, 3, 4, 5, 6],
...                    'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

-

>>> df
    foo   bar  baz  zoo
0   one   A    1    x
1   one   B    2    y
2   one   C    3    z
3   two   A    4    q
4   two   B    5    w
5   two   C    6    t

-

>>> df.pivot(index='foo', columns='bar', values='baz')
bar  A   B   C
foo
one  1   2   3
two  4   5   6

Answer 2

You can use scipy sparse matrices to do this. Documentation here .

D = sp.sparse.coo_matrix((qnty,(id_store,category))) # creates a sparse matrix from numpy vectors (np.ndarray)

If you want to make it a dense np.ndarray , just use:

D = D.toarray()

Or if you prefer the numpy np.matrix type, just use:

D = D.todense()

Data vectorization (get_dummies 3 columns to matrix)

Question

2 answers

solution1
0 ACCPTED 2018-06-09 10:28:28

solution2
0 2018-06-09 10:56:46

Data vectorization (get_dummies 3 columns to matrix)

Question

2 answers

solution1 0 ACCPTED 2018-06-09 10:28:28

solution2 0 2018-06-09 10:56:46

solution1
0 ACCPTED 2018-06-09 10:28:28

solution2
0 2018-06-09 10:56:46