简体   繁体   中英

Improving for loop speed for numpy.ndarray

I'm trying to calculate the mutual information for unigrams in a dataset. When trying to do this, I'm trying to improve the speed when looping through numpy ndarrays. I have the following code where I'm using an already created matrix 'C' with 6018 rows and 27721 columns in order to compute the PMI matrix. Any ideas how to improve the for loop speed (currently it takes almost 4 hours to run)? I read in some other post about using Cython, but are there any alternatives? In advance, thanks for your help.

# MAKE MUTUAL INFO MATRIX, PMI
print "Creating mutual information matrix"
N = C.sum()
invN = 1/N  # replaced divide by N with multiply by invN in formula below
PMI = np.zeros((C.shape))
row, col = C.shape
for r in xrange(row):  # u
    for c in xrange(r):  # w
        if C[r,c]!=0:  # if they co-occur
            numerator = C[r,c]*invN  # getting number of reviews where u and w co-occur and multiply by invN (numerator)
            denominator = (sum(C[:,c])*invN) * (sum(C[r])*invN)
            pmi = log10(numerator*(1/denominator))
            PMI[r,c] = pmi
            PMI[c,r] = pmi

You should get faster speeds if you can scrap the loops and take advantage of NumPy's vectorisation instead.

I haven't tried it, but something like this should work:

numerator = C * invN
denominator = (np.sum(C, axis=0) * invN) * (np.sum(C, axis=1)[:,None] * invN)
pmi = np.log10(numerator * (1 / denominator))

Note that numerator , denominator , and pmi will each be arrays of values rather than scalars.

Also, you might have to deal with the C == 0 case somehow:

pmi = np.log10(numerator[numerator != 0] * (1 / denominator[numerator != 0]))

As Blckknght pointed out in the comments, you could leave out some of the invN multiplications:

denominator = np.sum(C, axis=0) * np.sum(C, axis=1)[:,None] * invN
pmi = np.log10(C * (1 / denominator))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM