I have a task, which must be optimally solved. I have 50 categories and 10,000 stores that can have products from these categories, but all this in 3 columns
id_store category qnty
1 1 50
1 2 32
1 15 44
2 1 333
2 4 33
2 5 15
2 15 12
2 35 14
3 3 14
....
It is necessary to make a matrix out of this, where row - id_store, and columns - category, and their intersection - qnty:
id_shop/category 1 2 3 4 ...15 16... 35 36
1 50 32 0 0 44 0 0 0
2 333 0 0 33 12 0 14 0
3 0 0 14 0 0 0 0 0
You could use pandas
, which is a library specifically designed for dataframes like yours. From the pandas documentation , I found this example:
>>> df = pd.DataFrame({'foo': ['one', 'one', 'one', 'two', 'two',
... 'two'],
... 'bar': ['A', 'B', 'C', 'A', 'B', 'C'],
... 'baz': [1, 2, 3, 4, 5, 6],
... 'zoo': ['x', 'y', 'z', 'q', 'w', 't']})
-
>>> df
foo bar baz zoo
0 one A 1 x
1 one B 2 y
2 one C 3 z
3 two A 4 q
4 two B 5 w
5 two C 6 t
-
>>> df.pivot(index='foo', columns='bar', values='baz')
bar A B C
foo
one 1 2 3
two 4 5 6
You can use scipy sparse matrices to do this. Documentation here .
D = sp.sparse.coo_matrix((qnty,(id_store,category))) # creates a sparse matrix from numpy vectors (np.ndarray)
If you want to make it a dense np.ndarray
, just use:
D = D.toarray()
Or if you prefer the numpy np.matrix
type, just use:
D = D.todense()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.