简体   繁体   English

删除多个文件

[英]Delete multiple documents

The following code is working but extremely slow. 以下代码正在运行但速度极慢。 Up till the search function all goes well. 直到搜索功能一切顺利。 First, the search function returns a sequence and not an array (why?!). 首先,搜索函数返回一个序列而不是一个数组(为什么?!)。 Second, the array consists of nodes and I need URI's for the delete. 其次,数组由节点组成,我需要URI来删除。 And third, the deleteDocument function takes a string and not an array of URI's. 第三,deleteDocument函数接受一个字符串而不是URI的数组。

What would be the better way to do this? 有什么更好的方法呢? I need to delete year+ old documents. 我需要删除年份+旧文档。

Here I use xdmp.log in stead of document.delete just te be safe. 在这里我使用xdmp.log而不是document.delete只是安全。

var now      = new Date();
var yearBack = now.setDate(now.getDate() - 365); 

var date = new Date(yearBack);
var b    = cts.jsonPropertyRangeQuery("Dtm", "<", date);
var c    = cts.search(b, ['unfiltered']).toArray();

for (i=0; i<fn.count(c); i++) {
  xdmp.log(fn.documentUri(c[i]), "info");
};

Doing the same with cts.uris : cts.uris同样的cts.uris

var now      = new Date();
var yearBack = now.setDate(now.getDate() - 365);

var date = new Date(yearBack);
var b    = cts.jsonPropertyRangeQuery("Dtm", "<", date);
var c    = cts.uris("", [], b);

while (true) {
    var uri = c.next();

    if (uri.done == true){
        break;
    }

   xdmp.log(uri.value, "info");
 }

HTH! HTH!

Using toArray will work but is most likely were your slowness is. 使用toArray会起作用,但最有可能的是你的缓慢。 The cts.search() function returns an iterator. cts.search()函数返回一个迭代器。 So All you have to do is loop over it and do your deleting until there is no more items in it. 因此,您所要做的就是遍历它并进行删除,直到其中没有更多项目为止。 Also You might want to limit your search to 1,000 items. 您也可以将搜索范围限制为1,000个项目。 A transaction with a large number of deletes will take a while and might time out. 具有大量删除的事务将需要一段时间并且可能超时。

Here is an example of looping over the iterator 这是循环遍历迭代器的示例

var now      = new Date();
var yearBack = now.setDate(now.getDate() - 365);

var date = new Date(yearBack);
var b    = cts.jsonPropertyRangeQuery("Dtm", "<", date);
var c    = cts.search(b, ['unfiltered']);

while (true) {
    var doc = c.next();

    if (doc.done == true){
        break;
    }

   xdmp.log(fn.documentUri(doc), "info");
 }

here is an example if you wanted to limit to the first 1,000. 这是一个例子,如果你想限制到前1000。

fn.subsequence(cts.search(b, ['unfiltered']), 1, 1000);

Several things to consider. 需要考虑的几件事情。 1) If you are searching for the purpose of deleting or anything that doesnt require the document body, using a search that returns URIs instead of nodes can be much faster. 1)如果您正在搜索删除的目的或任何不需要文档正文的内容,使用返回URI而不是节点的搜索可以快得多。 If that isnt convenient then getting the URI as close to the search expression can achieve similar results. 如果这不方便,那么将URI尽可能接近搜索表达式可以获得类似的结果。 You want to avoid having the server have to fetch and expand the document just to get the URI to delete it. 您希望避免服务器必须获取和扩展文档只是为了获取URI来删除它。

2) While there is full coverage in the JavaScript API's for all MarkLogic features, the JavaScript API's are based on the same underlying functions that the XQuery API's use. 2)虽然JavaScript API的所有MarkLogic功能都有完整的覆盖范围,但JavaScript API基于与XQuery API相同的底层函数。 Its useful to understand that, and take a look at the equivalent XQuery API docs to get the big picture. 了解它是有用的,并查看等效的XQuery API文档以获得全局。 For example Arrays vs Iterators - If the JS search API's returned Arrays it could be a huge performance problem because the underlying code is based on 'lazy evaluation' of sequences. 例如Arrays vs Iterators - 如果JS搜索API返回了Arrays,那么它可能是一个巨大的性能问题,因为底层代码基于序列的“延迟评估”。 For example a search could return 1 million rows but if you only look at the first one the server can often avoid accessing the remaining 999,999,999 documents. 例如,搜索可以返回100万行,但如果只查看第一行,服务器通常可以避免访问剩余的999,999,999文档。 Similarly, as you iterate only the in scope referenced data needs to be in available. 类似地,当您仅迭代范围时,引用的数据需要可用。 If they had to be put into an array then all results would have to be pre-fetched and put put in memory upfront. 如果必须将它们放入阵列中,则必须预先获取所有结果并将其放入内存中。

3) Always keep in mind that operations which return lists of things may only be bounded by how big your database is. 3)始终记住,返回事物列表的操作可能只受数据库大小的限制。 That is why cts.search() and other functions have built in 'pagination'. 这就是为什么cts.search()和其他函数已经内置'分页'。 You should code for that from the start. 你应该从一开始就为它编码。 By reading the users guides you can get a better understanding of not only how to do something, but how to do it efficiently - or even at all - once your database becomes larger than memory. 通过阅读用户指南,您可以更好地了解如何做某事,但如果数据库变得比内存大,那么如何有效地 - 或者甚至根本无法 - 做到这一点。 In general its a good idea to always code for paginated results - it is a lot more efficient and your code will still work just as well after you add 100 docs or a million. 总的来说,总是编写分页结果的好主意 - 它更高效,并且在添加100个文档或100万个文档后,您的代码仍然可以正常工作。

4) take a look at xdmp.nodeUrl https://docs.marklogic.com/xdmp.nodeUri , This function, unlike fn.documentUri(), will work on any node even if its not document node. 4)看一下xdmp.nodeUrl https://docs.marklogic.com/xdmp.nodeUri ,这个函数与fn.documentUri()不同,即使它不是文档节点也可以在任何节点上工作。 If you can put this right next to the search instead of next to the delete then the system can optimize much better. 如果您可以将其放在搜索旁边而不是删除旁边,那么系统可以更好地进行优化。 The examples in the JavaScript guide are a good start https://docs.marklogic.com/guide/getting-started/javascript#chapter JavaScript指南中的示例是一个很好的开始https://docs.marklogic.com/guide/getting-started/javascript#chapter

In your case I suggest something like this to experiment with both pagination and extracting the URIs without having to expand the documents .. 在你的情况下,我建议这样的东西来试验分页和提取URI而不必扩展文档..

var uris = []
for (var result of fn.subsequence(cts.search( ... ), 1 , 100   )
  uris.push(xdmp.nodeUri(result))

for( i in uris ) 
  xdmp.log( uris[i] )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM