简体   繁体   English

读取集合中的所有文档时未找到 Cursor

[英]Cursor not found while reading all documents from a collection

I have a collection student and I want this collection as list in Python, but unfortunately I got the following error CursorNextError: [HTTP 404][ERR 1600] cursor not found .我有一个收藏student ,我想要这个收藏作为 Python 中的list ,但不幸的是我收到以下错误CursorNextError: [HTTP 404][ERR 1600] cursor not found Is there an option to read a 'huge' collection without an error?是否可以选择无误地读取“庞大”的集合?

from arango import ArangoClient

# Initialize the ArangoDB client.
client = ArangoClient()

# Connect to database as  user.
db = client.db(<db>, username=<username>, password=<password>)

print(db.collections())
students = db.collection('students')
#students.all()

students = db.collection('handlingUnits').all()
list(students)
[OUT] CursorNextError: [HTTP 404][ERR 1600] cursor not found

students = list(db.collection('students'))
[OUT] CursorNextError: [HTTP 404][ERR 1600] cursor not found

as suggested in my comment, if raising the ttl is not an option (what I wouldn't do either) I would get the data in chunks instead of all at once.正如我在评论中所建议的那样,如果提高 ttl 不是一个选项(我也不会这样做),我会分块而不是一次获取所有数据。 In most cases you don't need the whole collection anyway, so maybe think of limiting that first.在大多数情况下,无论如何您都不需要整个集合,所以也许首先考虑限制它。 Do you really need all documents and all their fields?您真的需要所有文档及其所有字段吗? That beeing said I have no experience with arango, but this is what I would do:那个蜜蜂说我没有使用 arango 的经验,但这就是我要做的:

entries = db.collection('students').count() # get total amount of documents in collection
limit=100 # blocksize you want to request
yourlist = [] # final output
for x in range(int(entries/limit) + 1):
    block = db.collection('students').all(skip=x*limit, limit=100)
    yourlist.extend(block) # assuming block is of type list. Not sure what arango returns

something like this.像这样的东西。 (Based on the documentation here: https://python-driver-for-arangodb.readthedocs.io/_/downloads/en/dev/pdf/ ) (基于此处的文档: https://python-driver-for-arangodb.readthedocs.io/_/downloads/en/dev/pdf/

Limit your request to a reasonable amount and then skip this amount with your next request.将您的请求限制在一个合理的数量,然后在您的下一个请求中跳过这个数量。 You have to check if this "range()" thing works like that you might have to think of a better way of defining the number of iterations you need.您必须检查这个“range()”是否像您可能需要考虑定义所需迭代次数的更好方法那样工作。 This also assumes arango sorts the all() function per default.这还假设 arango 默认对 all() function 进行排序。

So what is the idea?那么这个想法是什么?

  1. determin the number of entries in the collection.确定集合中的条目数。
  2. based on that determin how many requests you need (fe size=1000 -> 10 blocks each containing 100 entries)基于这个决定你需要多少请求(fe size=1000 -> 10 blocks each containing 100 entries)
  3. make x requests where you skip the blocks you already have.在跳过已有块的地方发出 x 请求。 First iteration entries 1-100;第一次迭代条目 1-100; second iteration 101-200, third iteration 201-300 etc.第二次迭代 101-200,第三次迭代 201-300 等。

By default, AQL queries generate the complete result, which is then held in memory, and provided batch by batch.默认情况下,AQL 查询生成完整的结果,然后保存在 memory 中,并逐批提供。 So the cursor is simply fetching the next batch of the already calculated result.所以 cursor 只是获取下一批已经计算的结果。 In most of the cases this is fine, but if your query produces a huge result set, then this can take a long time and will require a lot of memory.在大多数情况下这很好,但是如果您的查询产生一个巨大的结果集,那么这可能需要很长时间并且需要很多 memory。

As an alternative you can create a streaming cursor .作为替代方案,您可以创建流式传输 cursor See https://www.arangodb.com/docs/stable/http/aql-query-cursor-accessing-cursors.html and check the stream option.请参阅https://www.arangodb.com/docs/stable/http/aql-query-cursor-accessing-cursors.html并选中stream选项。 Streaming cursors calculate the next batch on demand and are therefore better suited to iterate a large collection.流式游标按需计算下一批,因此更适合迭代大型集合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM