用於有序查詢集的 Django 查詢集迭代器

Question

我想使用 queryset 迭代器來迭代大型數據集。 Django 為此提供了iterator() ，但是每次迭代都會命中數據庫。 我發現以下代碼以塊為單位進行迭代 -

  def queryset_iterator(queryset, chunksize=1000):
    '''''
    Iterate over a Django Queryset ordered by the primary key
    This method loads a maximum of chunksize (default: 1000) rows in it's
    memory at the same time while django normally would load all rows in it's
    memory. Using the iterator() method only causes it to not preload all the
    classes.
    Note that the implementation of the iterator
    does not support ordered query sets.
    '''
    pk = 0
    last_pk = queryset.order_by('-pk').values_list('pk', flat=True).first()
    if last_pk is not None:
        queryset = queryset.order_by('pk')
        while pk < last_pk:
            for row in queryset.filter(pk__gt=pk)[:chunksize]:
                pk = row.pk
                yield row
            gc.collect()

這適用於無序查詢集。 是否有任何解決方案/解決方法可以在有序查詢集上執行此操作？

Answer 1

這是我的，帶有排序功能。

順便說一下，您正在使用的迭代器在處理查詢集項目時有一個“永遠循環”：刪除或添加，甚至是一個項目。

下面的迭代器對 last_pk 沒有無用的查詢

def queryset_iterator(queryset, chunksize=10000, key=None):
    key = [key] if isinstance(key, str) else (key or ['pk'])
    counter = 0
    count = chunksize
    while count == chunksize:
        offset = counter - counter % chunksize
        count = 0
        for item in queryset.all().order_by(*key)[offset:offset + chunksize]:
            count += 1
            yield item
        counter += count
        gc.collect()

用於有序查詢集的 Django 查詢集迭代器

問題描述

1 個解決方案

解決方案1
7 已采納 2017-12-17 10:30:47

用於有序查詢集的 Django 查詢集迭代器

問題描述

1 個解決方案

解決方案1 7 已采納 2017-12-17 10:30:47

解決方案1
7 已采納 2017-12-17 10:30:47