简体   繁体   中英

General Matrix computation in Python, TF-IDF

While generating TF-IDF module, I just faced this matrix-vector computation.

A % b = C

[[1,2], [3,4]] % [1/2, 1/3] = [[1/2, 2/3], [3/2, 4/3]]

Here A is a matrix of Document x Words where A_ij is a Term-Frequency count of word i in document j . And b vector is pre-calculated IDF value for each words, for instance b_j is 1/7 if word j is used among 7 different documents.

How does the people call this column-wise multiplication? And are there any existing library support this operation? (Python)

  • Because of the large size & sparsity, I have been using csr_matrix in scipy to save matrix.
  • I tried to change them to np.array and execute A*b operations, however it didn't finish in few minutes.

Use NumPy for it.

It is element-wise multiplication :

import numpy as np
A = np.array([[1, 2], [3, 4]])
b = np.array([1/2, 1/3])
print(A * b)

output:

[[ 0.5         0.66666667]
 [ 1.5         1.33333333]]

In case of csr_matrix :

from scipy.sparse import csr_matrix
x1 = csr_matrix([[1, 2], [3, 4]])
x2 = csr_matrix([1/2, 1/3])
print(x1.multiply(x2).todense())

output:

[[ 0.5         0.66666667]
 [ 1.5         1.33333333]]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM