简体   繁体   English

使用 Java 客户端 API 在 MarkLogic 中获取所有文档 URI

[英]Fetching all the document URI's in MarkLogic Using Java Client API

i am trying to fetch all the documents from a database without knowing the exact url's .我试图在不知道确切 url 的情况下从数据库中获取所有文档。 I got one query我收到一个查询

DocumentPage documents =docMgr.read();
while (documents.hasNext()) {
    DocumentRecord document = documents.next();
    System.out.println(document.getUri());
}

But i do not have specific urls , i want all the documents但我没有特定的网址,我想要所有的文件

The first step is to enable your uris lexicon on the database.第一步是在数据库上启用您的 uris 词典。

You could eval some XQuery and run cts:uris() (or server-side JS and run cts.uris()):您可以评估一些 XQuery 并运行 cts:uris()(或服务器端 JS 并运行 cts.uris()):

    ServerEvaluationCall call = client.newServerEval()
        .xquery("cts:uris()");
    for ( EvalResult result : call.eval() ) {
        String uri = result.getString();
        System.out.println(uri);
    }

Two drawbacks are: (1) you'd need a user withprivileges and (2) there is no pagination.两个缺点是:(1) 您需要一个具有特权的用户;(2) 没有分页。

If you have a small number of documents, you don't need pagination.如果您有少量文档,则不需要分页。 But for a large number of documents pagination is recommended.但是对于大量文档,建议使用分页。 Here's some code using the search API and pagination:下面是一些使用搜索 API 和分页的代码:

    // do the next eight lines just once
    String options =
        "<options xmlns='http://marklogic.com/appservices/search'>" +
        "  <values name='uris'>" +
        "    <uri/>" +
        "  </values>" +
        "</options>";
    QueryOptionsManager optionsMgr = client.newServerConfigManager().newQueryOptionsManager();
    optionsMgr.writeOptions("uriOptions", new StringHandle(options));

    // run the following each time you need to list all uris
    QueryManager queryMgr = client.newQueryManager();
    long pageLength = 10000;
    queryMgr.setPageLength(pageLength);
    ValuesDefinition query = queryMgr.newValuesDefinition("uris", "uriOptions");
    // the following "and" query just matches all documents
    query.setQueryDefinition(new StructuredQueryBuilder().and());
    int start = 1;
    boolean hasMore = true;
    Transaction transaction = client.openTransaction();
    try {
        while ( hasMore ) {
            CountedDistinctValue[] uriValues =
                queryMgr.values(query, new ValuesHandle(), start, transaction).getValues();
            for (CountedDistinctValue uriValue : uriValues) {
                String uri = uriValue.get("string", String.class);
                //System.out.println(uri);
            }
            start += uriValues.length;
            // this is the last page if uriValues is smaller than pageLength
            hasMore = uriValues.length == pageLength;
        }
    } finally {
        transaction.commit();
    }

The transaction is only necessary if you need a guaranteed "snapshot" list isolated from adds/deletes happening concurrently with this process.仅当您需要与此过程同时发生的添加/删除操作隔离的有保证的“快照”列表时,才需要该事务。 Since it adds some overhead, feel free to remove it if you don't need such exactness.由于它增加了一些开销,如果您不需要这样的精确性,可以随意删除它。

find out the page length and in the queryMgr you can specify the starting point to access.找出页面长度,然后在 queryMgr 中指定要访问的起点。 Keep on increasing the starting point and loop through all the URL.继续增加起点并循环遍历所有 URL。 I was able to fetch all URI.我能够获取所有 URI。 This could be not so good approach but works.这可能不是很好的方法,但有效。

List<String> uriList = new ArrayList<>();       
        QueryManager queryMgr = client.newQueryManager();
        StructuredQueryBuilder qb = new StructuredQueryBuilder();
        StructuredQueryDefinition querydef = qb.and(qb.collection("xxxx"), qb.collection("whatever"), qb.collection("whatever"));//outputs 241152
        SearchHandle results = queryMgr.search(querydef, new SearchHandle(), 10);
        long pageLength = results.getPageLength();
        long totalResults = results.getTotalResults();
        System.out.println("Total Reuslts: " + totalResults);
        long timesToLoop = totalResults / pageLength;
        for (int i = 0; i < timesToLoop; i = (int) (i + pageLength)) {
            System.out.println("Printing Results from: " + (i) + " to: " + (i + pageLength));
            results = queryMgr.search(querydef, new SearchHandle(), i);
            MatchDocumentSummary[] summaries = results.getMatchResults();//10 results because page length is 10
            for (MatchDocumentSummary summary : summaries) {
//                System.out.println("Extracted friom URI-> " + summary.getUri());
                uriList.add(summary.getUri());
            }
            if (i >= 1000) {//number of URI to store/retreive. plus 10
                break;
            }
        }
         uriList= uriList.stream().distinct().collect(Collectors.toList());
        return uriList;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM