General Matrix computation in Python, TF-IDF

Question

While generating TF-IDF module, I just faced this matrix-vector computation.

A % b = C

[[1,2], [3,4]] % [1/2, 1/3] = [[1/2, 2/3], [3/2, 4/3]]

Here A is a matrix of Document x Words where A_ij is a Term-Frequency count of word i in document j . And b vector is pre-calculated IDF value for each words, for instance b_j is 1/7 if word j is used among 7 different documents.

How does the people call this column-wise multiplication? And are there any existing library support this operation? (Python)

Because of the large size & sparsity, I have been using csr_matrix in scipy to save matrix.
I tried to change them to np.array and execute A*b operations, however it didn't finish in few minutes.

Answer 1

Use NumPy for it.

It is element-wise multiplication :

import numpy as np
A = np.array([[1, 2], [3, 4]])
b = np.array([1/2, 1/3])
print(A * b)

output:

[[ 0.5         0.66666667]
 [ 1.5         1.33333333]]

In case of csr_matrix :

from scipy.sparse import csr_matrix
x1 = csr_matrix([[1, 2], [3, 4]])
x2 = csr_matrix([1/2, 1/3])
print(x1.multiply(x2).todense())

output:

[[ 0.5         0.66666667]
 [ 1.5         1.33333333]]

General Matrix computation in Python, TF-IDF

Question

1 answers

solution1
2 ACCPTED 2015-11-03 03:01:19

General Matrix computation in Python, TF-IDF

Question

1 answers

solution1 2 ACCPTED 2015-11-03 03:01:19

solution1
2 ACCPTED 2015-11-03 03:01:19