“Pointer being freed was not allocated” error using cython in parallel with an external c library based on FFTW

Question

What I'm attempting

I have a code which was previously parallelised in python using multiprocess and it worked okay (although slow and memory hungry). I've decided to try to convert it to cython . I'm new to cython and not much experience in c . The below example is as simplified as I can get it, and it worked serially, but as soon as I parallelised it, it no longer works. Due to the nature of running in parallel I've gone through all my code and turned off gil .

The code relies on an external C library https://github.com/astro-informatics/ssht/ (compilation instructions in the README ) which uses fftw under the hood. This library has its own cython file which calls the same c function that I'm using ( ssht_core_mw_inverse_sov_sym_ss ). The function which closely resembles mine (in the cython file in that repo) looks like this

def ssht_inverse_mwss_complex(
    np.ndarray[ double complex, ndim=1, mode="c"] f_lm not None,
    int L,
    int spin
):
    cdef ssht_dl_method_t dl_method = SSHT_DL_RISBO
    f_mwss_c = np.empty([L+1,2*L,], dtype=complex)
    ssht_core_mw_inverse_sov_sym_ss(
        <double complex*> np.PyArray_DATA(f_mwss_c),
        <const double complex*> np.PyArray_DATA(f_lm),
        L,
        spin,
        dl_method,
        0
    )
    return f_mwss_c

I've essentially had to recreate this locally as I needed it without gil .

The problem

When I run a script using the cython module I get a segmentation fault but the error is slightly different each time. Either there is no error for explanation, or it talks about pointer memory allocation:

malloc: *** error for object 0x7f83de8b58e0: pointer being freed was not allocated
python(21855,0x70000d333000) malloc: Double free of object 0x7f83de8b58e0
python(21855,0x70000d536000) malloc: *** set a breakpoint in malloc_error_break to debug

or it seems to be specific to FFTW :

fftw: /Users/runner/.conan/data/fftw/3.3.8/_/_/build/55f3919d9a41efc78a625ee65e5d1ea60d02b2ff/source_subfolder/kernel/planner.c:261: assertion failed: SLVNDX(slot) == slvndx

Looking around I've found this kind of issue https://github.com/bytedeco/javacpp-presets/issues/435 so I'm hoping that fftw means what I'm trying to do isn't possible (and more that I'm bad at c ).

What I've tried

I've tried using free from libc.stdlib but that doesn't do the trick. I'd also tried to create the arrays using cython.view arrays but struggled to make them double complex (which is required by the ssht library). I tried to get cython debug to work but had issues getting this to work on my mac. I've also spent 2 days banging my ahead against the wall...

My system

I compile my extension in the usual manner python setup.py build_ext --inplace . I'm using python3.8.5 , Cython==0.29.21 . I'm running on macOS 11.0.1 .

The code

My cython file:

import numpy as np
from libc.stdio cimport printf
from libc.stdlib cimport calloc, malloc
from cython.parallel import parallel, prange
from openmp cimport omp_get_thread_num

# needed to recreate without importing (for nogil)
cdef extern from "ssht/ssht.h" nogil:
    ctypedef enum ssht_dl_method_t:
        SSHT_DL_RISBO, SSHT_DL_TRAPANI
    void ssht_core_mw_inverse_sov_sym_ss(
        double complex *f,
        const double complex *flm,
        int L,
        int spin,
        ssht_dl_method_t dl_method,
        int verbosity
    )

def my_cython_module(int L, int threads):
    """
    dummy function more to show that parallel loops fails
    """
    cdef int ell, tid
    with nogil, parallel(num_threads=threads):
        tid = omp_get_thread_num()
        for ell in prange(L * L, schedule="guided"):
            printf("ell: %i\n", ell)
            _ssht_inverse(L, ell)

cdef double complex * _ssht_inverse(int L, int ind) nogil:
    """
    function creates a 1D complex array flm  with zeros and a 1
    then calls c function to get 2D complex array f
    not returning anything as it's just for demonstration
    """
    cdef ssht_dl_method_t dl_method = SSHT_DL_RISBO
    cdef double complex *flm = NULL
    cdef double complex *f = NULL
    flm = <double complex *> calloc(L * L, sizeof(double complex))
    flm[ind] = 1
    f = <double complex *> malloc((L + 1) * (2 * L) * sizeof(double complex))
    ssht_core_mw_inverse_sov_sym_ss(f, flm, L, 0, dl_method, 0)
    return f

My setup.py :

import os
from Cython.Build import cythonize
from setuptools import Extension, setup

# running on mac so need GCC instead of clang
os.environ["CC"] = "gcc-10"

setup(
    ext_modules=cythonize(
        Extension(
            "test",
            ["*.pyx"],
            extra_compile_args=["-fopenmp"],
            extra_link_args=["-fopenmp"],
            include_dirs=["/usr/local/include"],
        ),
        annotate=True,
        language_level=3,
        compiler_directives=dict(boundscheck=False, embedsignature=True),
    ),
)

A minimal working example in python using parallelism

The following works ( pip install pyssht ) and works successfully in parallel. So problem seems to be with c / cython

# the cython wrapper from the external library
from pyssht import ssht_inverse_mwss_complex
import numpy as np
from multiprocess import Pool

def my_python_implementation(L, threads):
    """
    the python equivalent in parallel
    """
    def func(chunk):
        """
        deals with each chunk
        """
        for ell in chunk:
            print(f"ell: {ell}")
            flm = np.zeros(L * L, dtype=np.complex_)
            flm[ell] = 1
            ssht_inverse_mwss_complex(flm, L, 0)

    chunks = np.array_split(np.arange(L * L), threads)
    with Pool(processes=threads) as p:
        p.map(func, chunks)

Thanks in advance!

Seeing as I was able to run it in parallel in python I'm really hoping that it can be done.

Answer 1

So as @DavidW pointed out that I'm running issues due to the fact that FFTW can't be run multithreaded (but works in python with multiprocessing ). The problem is related to the external code I'm using which relies on FFTW . I've raised an issue to see if we can force the FFTW bits to be single threaded https://github.com/astro-informatics/ssht/issues/44

“Pointer being freed was not allocated” error using cython in parallel with an external c library based on FFTW

Question

What I'm attempting

The problem

What I've tried

My system

The code

A minimal working example in python using parallelism

Thanks in advance!

1 answers

solution1
0 ACCPTED 2020-11-28 09:55:17

“Pointer being freed was not allocated” error using cython in parallel with an external c library based on FFTW

Question

What I'm attempting

The problem

What I've tried

My system

The code

A minimal working example in python using parallelism

Thanks in advance!

1 answers

solution1 0 ACCPTED 2020-11-28 09:55:17

solution1
0 ACCPTED 2020-11-28 09:55:17