简体   繁体   中英

Batch check document existence in Vespa

I have a list of docid and want to check if they exist in Vespa. If so, return a specific field of that docid . Currently, I'm doing this sequentially. Sample code in Python:

import requests
doc_urlbase = 'http://localhost:8080/document/v1/test/test'
docid_list = [1,2,3,4,5]
for docid in docid_list:
    doc_url = '{}/{}'.format(doc_urlbase, i)
    req = requests.get(doc_url)
    if req.status_code == 200:
        # docid is in Vespa, save the field value
    else:
        # display not found

I'm hoping there's a better way to do so, and return an array/map as result. Something like:

Query given:
    docid_list = [1,2,3,4,5]

Return:
    {
        1: "field value",
        2: "field value",
        3: "",             # not in Vespa
        4: "field value",
        5: "field value",
    }

Thanks!

If your list is large relative to corpus you can use vespa-visit to quickly dump all ids and then match the sets

I assume that is not the case. If you do this frequently, you can create a Component like Searcher or Handler that you POST the id list to. In the Component, use Java Document API to Get each ID, and create a Hit for each match. Each such Get will be in ms range, so will be quicker - the tradeoff you will have to write some code.

You can also run the same code from a standalone Java program.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM