简体   繁体   English

在python中使用类似C的数组

[英]Using c-like arrays in python

Is the following ever done in python to minimize the "allocation time" of creating new objects in a for loop in python? 是否在python中进行了以下操作,以最小化在python中的for循环中创建新对象的“分配时间”? Or, is this considered bad practice / there is a better alternative? 还是,这被认为是不好的做法/有更好的选择吗?

for row in rows:
    data_saved_for_row = [] // re-initializes every time (takes a while)
    for item in row:
        do_something()
    do_something

vs. the "c-version" -- 与“ c版本”

data_saved_for_row = []
for row in rows:
    for index, item in enumerate(row):
        do_something()
    data_saved_for_row[index + 1] = '\0' # now we have a crude way of knowing
    do_something_with_row()              # when it ends without having 
                                         # to always reinitialize

Normally the second approach seems like a terrible idea, but I've run into situations when iterating million+ items where the initialization time of the row: 通常,第二种方法似乎是一个糟糕的主意,但是我遇到了迭代行初始化时间超过百万个项目的情况:

data_saved_for_row = []

has taken a second or more to do. 花了第二个或更多时间。

Here's an example: 这是一个例子:

>>> print timeit.timeit(stmt="l = list();", number=int(1e8))
7.77035903931

If you want functionality for this sort of performance, you may as well just write it in C yourself and import it with ctypes or something. 如果您想要实现这种性能的功能,则不妨自己用C编写并使用ctypes或其他内容导入。 But then, if you're writing this kind of performance-driven application, why are you using Python to do it in the first place? 但是,如果您正在编写这种性能驱动的应用程序,那么为什么要首先使用Python来完成呢?

You can use list.clear() as a middle-ground here, not having to reallocate anything immediately: 您可以在此处使用list.clear()作为中间依据,而不必立即重新分配任何内容:

data_saved_for_row = []
for row in rows:
    data_saved_for_row.clear()
    for item in row:
        do_something()
    do_something

but this isn't a perfect solution, as shown by the cPython source for this (comments omitted): 但这不是一个完美的解决方案,如cPython来源所示(注释省略):

static int
_list_clear(PyListObject *a)
{
    Py_ssize_t i;
    PyObject **item = a->ob_item;
    if (item != NULL) {
        i = Py_SIZE(a);
        Py_SIZE(a) = 0;
        a->ob_item = NULL;
        a->allocated = 0;
        while (--i >= 0) {
            Py_XDECREF(item[i]);
        }
        PyMem_FREE(item);
    }

    return 0;
}

I'm not perfectly fluent in C, but this code looks like it's freeing the memory stored by the list, so that memory will have to be reallocated every time you add something to that list anyway. 我不太熟练使用C,但是这段代码似乎释放了列表存储的内存,因此无论何时每次向列表中添加内容时都必须重新分配内存。 This strongly implies that the python language just doesn't natively support your approach. 这强烈暗示着python语言本身并不支持您的方法。


Or you could write your own python data structure (as a subclass of list , maybe) that implements this paradigm (never actually clearing its own list, but maintaining a continuous notion of its own length), which might be a cleaner solution to your use case than implementing it in C. 或者,您可以编写自己的python数据结构(可能是list的子类),以实现该范例(从不真正清除其自身的列表,而是保持其自身长度的连续概念),这可能是一种更清洁的解决方案案例,而不是用C来实现。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM