简体   繁体   English

python-couchdb传呼器达到递归深度限制

[英]python-couchdb pager hitting recursion depth limit

I am creating a pager that returns documents from an Apache CouchDB map function from python-couchdb . 我正在创建一个传呼器,该传呼器从python-couchdb的Apache CouchDB映射函数返回文档。 This generator expression is working well, until it hits the max recursion depth. 此生成器表达式运行良好,直到达到最大递归深度为止。 How can it be improved to move to iteration, rather than recursion? 如何改进而不是递归进行迭代?

def page(db, view_name, limit, include_docs=True, **opts):
    """
    `page` goes returns all documents of CouchDB map functions. It accepts
    all options that `couchdb.Database.view` does, however `include_docs` 
    should be omitted, because this will interfere with things.

    >>> import couchdb
    >>> db = couchdb.Server()['database']
    >>> for doc in page(db, '_all_docs', 100):
    >>>    doc
    #etc etc
    >>> del db['database']

    Notes on implementation:
      - `last_doc` is assigned on every loop, because there doesn't seem to
        be an easy way to know if something is the last item in the iteration.
    """

    last_doc = None
    for row in db.view(view_name,
                     limit=limit+1,
                     include_docs=include_docs,
                     **opts):
        last_doc = row.key, row.id
        yield row.doc
    if last_doc:
        for doc in page(db, view_name, limit,
               inc_docs=inc_docs, 
               startkey=last_doc[0], 
               startkey_docid=last_doc[1]):
          yield doc

Here's something to get you started. 这是一些可以帮助您入门的东西。 You didn't specify what *opts might be; 您没有指定*opts可能是什么; if you only need startkey and startkey_docid to start the recursion, and not some other fields, then you can get rid of the extra function. 如果您只需要startkey和startkey_docid来启动递归,而不需要其他一些字段,则可以摆脱额外的功能。

Obviously, untested. 显然,未经测试。

def page_key(db, view_name, limit, startkey, startkey_docid, inc_docs=True):
    queue = [(startkey, startkey_docid)]
    while queue:
        key = queue.pop()

        last_doc = None
        for row in db.view(view_name,
                           limit=limit+1,
                           include_docs=inc_docs,
                           startkey=key[0],
                           startkey_docid=key[1]):
            last_doc = row.key, row.id
            yield row.doc

        if last_doc:
            queue.append(last_doc)

def page(db, view_name, limit, inc_docs=True, **opts):
    last_doc = None
    for row in db.view(view_name,
                       limit=limit+1,
                       include_docs=inc_docs,
                       **opts):
        last_doc = row.key, row.id
        yield row.doc

    if last_doc:
        for doc in page_key(db, view_name, limit, last_doc[0], last_doc[1], inc_docs):
            yield doc

This is an alternative approach that I've tested (manually) on a database with >800k docs. 这是我已经(手动)在具有> 800k个文档的数据库上测试的另一种方法。 Seems to work. 似乎可以工作。

 def page2(db, view_name, limit, inc_docs=True, **opts):
     def get_batch(db=db, view_name=view_name, limit=limit, inc_docs=inc_docs, **opts):
         for row in db.view(view_name, limit=limit+1, include_docs=inc_docs, **opts):
             yield row
     last_doc = None
     total_rows = db.view(view_name, limit=1).total_rows
     batches = (total_rows / limit) + 1
     for i in xrange(batches):
         if not last_doc:
             for row in get_batch():
                 last_doc = row.key, row.id
                 yield row.doc or row # if include_docs is False, 
                                      # row.doc will be None
         else:
             for row in get_batch(startkey=last_doc[0], 
                             startkey_docid=last_doc[1]):
                 last_doc = row.key, row.id
                 yield row.doc or row

I don't use CouchDB so I had a little trouble understanding the sample code. 我不使用CouchDB,因此在理解示例代码时遇到了一些麻烦。 Here's a stripped down version, which I believe works the way you want: 这是一个简化的版本,我相信它可以按照您想要的方式工作:

all_docs = range(0, 100)

def view(limit, offset):
    print "view: returning", limit, "rows starting at", offset
    return all_docs[offset:offset+limit]

def generate_by_pages(page_size):
    offset = 0
    while True:
        rowcount = 0
        for row in generate_page(page_size, offset):
            rowcount += 1
            yield row
        if rowcount == 0:
            break
        else: 
            offset += rowcount

def generate_page(page_size, offset):
    for row in view(page_size, offset):
        yield row

for r in generate_by_pages(10):
    print r

The key thing is replacing recursion with iteration. 关键是用迭代代替递归。 There are lots of ways to do this (I like trampolining in Python) but the above is straightforward. 有很多方法可以做到这一点(我喜欢Python中的蹦床),但是上面的内容很简单。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM