简体   繁体   中英

Marklogic : Removing duplicates using XQuery

I have removed the duplicates entry based on one attributes in xml. My problem is need to remove the duplicates for comparing multiple attributes column.

Input
    <Id>
        <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D" Product_Option="10070D"/>
        <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D" Product_Option="10070D"/>
        <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D1" Product_Option="10070D"/>
        <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D1" Product_Option="10070D"/>
        <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D" Product_Option="10070D"/>
      </Id>

Expected output:

  <Id>
    <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D" Product_Option="10070D"/>
    <tbl_Keysight_Input Auto_Id="66365" Product_No="10070D1" Product_Option="10070D"/>
  </Id>

Please provide the xquery for my requirement.

The below query is based on Auto_id only.

for $d in distinct-values(xdmp:directory("/documents/","1")//Id/tbl_Keysight_Input/@Auto_Id)
let $items := xdmp:directory("/documents/","1")/id/tbl_Keysight_Input[@Auto_Id = $d]
order by $d
return 

         for $i in $items [position() le 1]
         return $i

Assuming that all elements to be compared reside within the same parent element, you can check, for each tbl_Keysight_Input , if any preceding-sibling element is deep-equal , and only return tbl_Keysight_Input where none of the preceding elements are deep-equal. So for each group of elements with the same attributes, only the first element will be taken since that one has no preceding duplicate.

I don't have marklogic for testing this though, but the following should illustrate the idea in XQuery :

for $x in xdmp:directory("/documents/","1")/id/tbl_Keysight_Input
where count($x/preceding-sibling::tbl_Keysight_Input[fn:deep-equal(.,$x)]) = 0
return $x

The easiest way to compare and filter would be to use fn:deep-equal() . Since you have a directory of XML documents and want to compare these elements across documents, you may need to use a temporary XML structure.

You could select all of the tbl_Keysight_Input elements, put them into a temporary element structure, so that they are in the same element. Then, select and iterate through each tbl_Keysight element and use fn:deep-equals() in a predicate to ensure that they are unique.

The following will work, but depending on the number of documents in the directory, and the number of tbl_Keysight_Input elements that they contain, this might not scale.

for $x in <temp>{xdmp:directory("/documents/","1")/id/tbl_Keysight_Input}</temp>/*
where $x[not(preceding-sibling::*[fn:deep-equal(., $x)])]
return $x

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM