简体   繁体   中英

Is itertools thread-safe?

For instance, if I create an iterator using chain , can I call it on multiple threads? Note that thread-safety that relies on the GIL is acceptable, but not preferable.

(Note that this is a bit different from this question , which deals with generators, not iterators written in C).

Firstly, nothing in the official documentation on itertools say that they're thread-safe. So it seems that by specification Python does not guarantee anything about that. This might be different across implementations such as Jython or PyPy, but this means your code probably wont be portable.

Secondly, most itertools (with exception of simple ones, like count ) take other iterators as their input. You'd need these iterators to also behave correctly in a thread-safe way.

Thirdly, some iterators might not make sense when used simultaneously by different threads. For example izip working in multiple threads might get into race condition taking elements from multiple sources, especially as defined by equivalent python code (what will happen when one thread will manage to take value from only one input iterator, then second thread from two of them?).

Also note that the documentation does not mention that itertools are implemented in C. We know (as an implementation detail) that CPython's itertools are actually written in C, but on other implementations they can happily be implemented as generators, and you can go back to the question you cited .

So, no, you cannot assume that they are thread-safe unless you know implementation details of your target python platform.

the current implementation seems to be atomically (threadsafe)

CPython-3.8, https://github.com/python/cpython/blob/v3.8.1/Modules/itertoolsmodule.c#L4129

static PyTypeObject count_type = {
    PyVarObject_HEAD_INIT(NULL, 0)
    "itertools.count",                  /* tp_name */
    sizeof(countobject),                /* tp_basicsize */
    0,                                  /* tp_itemsize */
    /* methods */
    (destructor)count_dealloc,          /* tp_dealloc */
    0,                                  /* tp_vectorcall_offset */
    0,                                  /* tp_getattr */
    0,                                  /* tp_setattr */
    0,                                  /* tp_as_async */
    (reprfunc)count_repr,               /* tp_repr */
    0,                                  /* tp_as_number */
    0,                                  /* tp_as_sequence */
    0,                                  /* tp_as_mapping */
    0,                                  /* tp_hash */
    0,                                  /* tp_call */
    0,                                  /* tp_str */
    PyObject_GenericGetAttr,            /* tp_getattro */
    0,                                  /* tp_setattro */
    0,                                  /* tp_as_buffer */
    Py_TPFLAGS_DEFAULT | Py_TPFLAGS_HAVE_GC |
        Py_TPFLAGS_BASETYPE,            /* tp_flags */
    itertools_count__doc__,             /* tp_doc */
    (traverseproc)count_traverse,       /* tp_traverse */
    0,                                  /* tp_clear */
    0,                                  /* tp_richcompare */
    0,                                  /* tp_weaklistoffset */
    PyObject_SelfIter,                  /* tp_iter */
    (iternextfunc)count_next,           /* tp_iternext */
    count_methods,                      /* tp_methods */
    0,                                  /* tp_members */
    0,                                  /* tp_getset */
    0,                                  /* tp_base */
    0,                                  /* tp_dict */
    0,                                  /* tp_descr_get */
    0,                                  /* tp_descr_set */
    0,                                  /* tp_dictoffset */
    0,                                  /* tp_init */
    0,                                  /* tp_alloc */
    itertools_count,                    /* tp_new */
    PyObject_GC_Del,                    /* tp_free */
};

// ... ... ...

static PyObject *
count_nextlong(countobject *lz)
{
    PyObject *long_cnt;
    PyObject *stepped_up;

    long_cnt = lz->long_cnt;
    if (long_cnt == NULL) {
        /* Switch to slow_mode */
        long_cnt = PyLong_FromSsize_t(PY_SSIZE_T_MAX);
        if (long_cnt == NULL)
            return NULL;
    }
    assert(lz->cnt == PY_SSIZE_T_MAX && long_cnt != NULL);

    stepped_up = PyNumber_Add(long_cnt, lz->long_step);
    if (stepped_up == NULL)
        return NULL;
    lz->long_cnt = stepped_up;
    return long_cnt;
}

static PyObject *
count_next(countobject *lz)
{
    if (lz->cnt == PY_SSIZE_T_MAX)
        return count_nextlong(lz);
    return PyLong_FromSsize_t(lz->cnt++);
}

because there is no place between stepped_up = PyNumber_Add(long_cnt, lz->long_step);and lz->long_cnt = stepped_up; (or inside this PyNumber_Add() ) where threads could be switched. it was a so colled "slow mode".

in a "fast mode" the construnction PyLong_FromSsize_t(lz->cnt++) is obvously atomically.

the other part of threadsafing is provided by GIL:

  • threads switching happens in some points when python-bytecode runs. and in i/o-functions.

  • memory fences for elimination memory reorder side effects

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM