简体   繁体   中英

Data vectorization (get_dummies 3 columns to matrix)

I have a task, which must be optimally solved. I have 50 categories and 10,000 stores that can have products from these categories, but all this in 3 columns

id_store  category    qnty
    1         1        50
    1         2        32
    1         15       44
    2         1        333
    2         4        33
    2         5        15
    2         15       12
    2         35       14
    3         3        14
    ....     

It is necessary to make a matrix out of this, where row - id_store, and columns - category, and their intersection - qnty:

id_shop/category  1   2   3   4 ...15  16... 35   36
   1              50  32  0   0    44  0     0    0
   2              333 0   0   33   12  0     14   0
   3              0   0   14  0    0   0     0    0     

You could use pandas , which is a library specifically designed for dataframes like yours. From the pandas documentation , I found this example:

>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
...                            'two'],
...                    'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
...                    'baz': [1, 2, 3, 4, 5, 6],
...                    'zoo': ['x', 'y', 'z', 'q', 'w', 't']})

-

>>> df
    foo   bar  baz  zoo
0   one   A    1    x
1   one   B    2    y
2   one   C    3    z
3   two   A    4    q
4   two   B    5    w
5   two   C    6    t

-

>>> df.pivot(index='foo', columns='bar', values='baz')
bar  A   B   C
foo
one  1   2   3
two  4   5   6

You can use scipy sparse matrices to do this. Documentation here .

D = sp.sparse.coo_matrix((qnty,(id_store,category))) # creates a sparse matrix from numpy vectors (np.ndarray)

If you want to make it a dense np.ndarray , just use:

D = D.toarray()

Or if you prefer the numpy np.matrix type, just use:

D = D.todense()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM