Numba slower than pure Python in frequency counting

Question

Given a data matrix with discrete entries represented as a 2D numpy array, I'm trying to compute the observed frequencies of some features (the columns) only looking at some instances (the rows of the matrix).

I can do that quite easily with numpy using bincount applied to each slice after having done some fancy slicing. Doing that in pure Python, using an external data structure as a count accumulator, is a double loop in C-style.

import numpy

import numba

try:
    from time import perf_counter
except:
    from time import time
    perf_counter = time


def estimate_counts_numpy(data,
                          instance_ids,
                          feature_ids):
    """
    WRITEME
    """
    #
    # slicing the data array (probably memory consuming)
    curr_data_slice = data[instance_ids, :][:, feature_ids]

    estimated_counts = []
    for feature_slice in curr_data_slice.T:
        counts = numpy.bincount(feature_slice)
        #
        # checking just for the all 0 case:
        # this is not stable for not binary datasets TODO: fix it
        if counts.shape[0] < 2:
            counts = numpy.append(counts, [0], 0)
        estimated_counts.append(counts)

    return estimated_counts


@numba.jit(numba.types.int32[:, :](numba.types.int8[:, :],
                                   numba.types.int32[:],
                                   numba.types.int32[:],
                                   numba.types.int32[:],
                                   numba.types.int32[:, :]))
def estimate_counts_numba(data,
                          instance_ids,
                          feature_ids,
                          feature_vals,
                          estimated_counts):
    """
    WRITEME
    """

    #
    # actual counting
    for i, feature_id in enumerate(feature_ids):
        for instance_id in instance_ids:
            estimated_counts[i][data[instance_id, feature_id]] += 1

    return estimated_counts


if __name__ == '__main__':
    #
    # creating a large synthetic matrix, testing for performance
    rand_gen = numpy.random.RandomState(1337)
    n_instances = 2000
    n_features = 2000
    large_matrix = rand_gen.binomial(1, 0.5, (n_instances, n_features))
    #
    # random indexes too
    n_sample = 1000
    rand_instance_ids = rand_gen.choice(n_instances, n_sample, replace=False)
    rand_feature_ids = rand_gen.choice(n_features, n_sample, replace=False)
    binary_feature_vals = [2 for i in range(n_features)]
    #
    # testing
    numpy_start_t = perf_counter()

    e_counts_numpy = estimate_counts_numpy(large_matrix,
                                           rand_instance_ids,
                                           rand_feature_ids)
    numpy_end_t = perf_counter()
    print('numpy done in {0} secs'.format(numpy_end_t - numpy_start_t))

    binary_feature_vals = numpy.array(binary_feature_vals)
    #
    #
    curr_feature_vals = binary_feature_vals[rand_feature_ids]
    #
    # creating a data structure to hold the slices
    # (with numba I cannot use list comprehension?)
    # e_counts_numba = [[0 for val in range(feature_val)]
    #                   for feature_val in
    #                   curr_feature_vals]
    e_counts_numba = numpy.zeros((n_sample, 2), dtype='int32')
    numba_start_t = perf_counter()

    estimate_counts_numba(large_matrix,
                          rand_instance_ids,
                          rand_feature_ids,
                          binary_feature_vals,
                          e_counts_numba)
    numba_end_t = perf_counter()
    print('numba done in {0} secs'.format(numba_end_t - numba_start_t))

These are the times I get while running the above code:

numpy done in 0.2863295429997379 secs
numba done in 11.55551904299864 secs

My point here is that my implementation is even slower when I try to apply a jit with numba, so I highly suspect I am messing things up.

Answer 1

The reason your function is slow is because Numba has fallen back to object mode to compile the loop.

There are two problems:

Numba doesn't yet support chained indexing of multidimensional arrays, so you need to rewrite this:

estimated_counts[i][data[instance_id, feature_id]]

into this:

estimated_counts[i, data[instance_id, feature_id]]

Your explicit type signature is incorrect. All of your input arrays are actually int64, rather than int8/int32. Rather than fix your signature, you can rely on Numba's automatic JIT to detect the argument types and compile the right version. All you have to do is change the decorator to just @numba.jit . Just make sure you call the function once before you benchmark if you don't want to include compilation time.

With these changes, I benchmark Numba to be about 15% faster than NumPy for this function.

Numba slower than pure Python in frequency counting

Question

1 answers

solution1
3 ACCPTED 2014-10-03 21:52:00

Numba slower than pure Python in frequency counting

Question

1 answers

solution1 3 ACCPTED 2014-10-03 21:52:00

solution1
3 ACCPTED 2014-10-03 21:52:00