简体   繁体   中英

Global Interpreter Lock and access to data (eg. for NumPy arrays)

I am writing a C extension for Python, which should release the Global Interpreter Lock while it operates on data. I think I have understood the mechanism of the GIL fairly well, but one question remains: Can I access data in a Python object while the thread does not own the GIL? For example, I want to read data from a (big) NumPy array in the C function while I still want to allow other threads to do other things on the other CPU cores. The C function should

  • release the GIL with Py_BEGIN_ALLOW_THREADS
  • read and work on the data without using Python functions
  • even write data to previously constructed NumPy arrays
  • reacquire the GIL with Py_END_ALLOW_THREADS

Is this safe? Of course, other threads are not supposed to change the variables which the C function uses. But maybe there is one hidden source for errors: could the Python interpreter move an object, eg. by some sort of garbage collection, while the C function works on it in a separate thread?

To illustrate the question with a minimal example, consider the (minimal but complete) code below. Compile it (on Linux) with

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -fPIC -I/usr/lib/pymodules/python2.7/numpy/core/include -I/usr/include/python2.7 -c gilexample.c -o gilexample.o
gcc -pthread -shared gilexample.o -o gilexample.so

and test it in Python with

import gilexample
gilexample.sum([1,2,3])

Is the code between Py_BEGIN_ALLOW_THREADS and Py_END_ALLOW_THREADS safe? It accesses the contents of a Python object, and I do not want to duplicate the (possibly large) array in memory.

#include <Python.h>
#include <numpy/arrayobject.h>

// The relevant function
static PyObject * sum(PyObject * const self, PyObject * const args) {
  PyObject * X;
  PyArg_ParseTuple(args, "O", &X);
  PyObject const * const X_double = PyArray_FROM_OTF(X, NPY_DOUBLE, NPY_ALIGNED);
  npy_intp const size = PyArray_SIZE(X_double);
  double * const data = (double *) PyArray_DATA(X_double);
  double sum = 0;

  Py_BEGIN_ALLOW_THREADS // IS THIS SAFE?

  npy_intp i;
  for (i=0; i<size; i++)
    sum += data[i];

  Py_END_ALLOW_THREADS

  Py_DECREF(X_double);
  return PyFloat_FromDouble(sum);
}

// Python interface code
// List the C methods that this extension provides.
static PyMethodDef gilexampleMethods[] = {
  {"sum", sum, METH_VARARGS},
  {NULL, NULL, 0, NULL}     /* Sentinel - marks the end of this structure */
};

// Tell Python about these methods.
PyMODINIT_FUNC initgilexample(void)  {
  (void) Py_InitModule("gilexample", gilexampleMethods);
  import_array();  // Must be present for NumPy.
}

Is this safe?

Strictly, no. I think you should move the calls to PyArray_SIZE and PyArray_DATA outside the GIL-less block; if you do that, you'll be operating on C data only. You might also want to increment the reference count on the object before going into the GIL-less block and decrement it afterwards.

After your edits, it should be safe. Don't forget to decrement the reference count afterwards.

Can I access data in a Python object while the thread does not own the GIL?

No you cannot.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM