简体   繁体   中英

split document by using MarkLogic Flow Editor

i try to split my incoming documents using "Information Studio Flows" (MarkLogic v 8.0-1.1). The problem is in "Transform" section.

This is my importing documents. For simplicity i reduce it content to one stwtext-element

 <docs> <stwtext id="RD-10-00258" update="03.2011" seq="RQ-10-00001"> <head> <ti> <i>j</i> </ti> <ff-list> <ff id="0103"/> </ff-list> </head><p> Symbol für die <vw idref="RD-19-04447">Stromdichte</vw> . </p> </stwtext> </docs> 

This is my "xquery transform" content:

 xquery version "1.0-ml"; (: Copyright 2002-2015 MarkLogic Corporation. All Rights Reserved. :) (: :: Custom action. It must be a CPF action module. :: Replace this text completely, or use it as a template and :: add imports, declarations, :: and code between START and END comment tags. :: Uses the external variables: :: $cpf:document-uri: The document being processed :: $cpf:transition: The transition being executed :) import module namespace cpf = "http://marklogic.com/cpf" at "/MarkLogic/cpf/cpf.xqy"; (: START custom imports and declarations; imports must be in Modules/ on filesystem :) (: END custom imports and declarations :) declare option xdmp:mapping "false"; declare variable $cpf:document-uri as xs:string external; declare variable $cpf:transition as node() external; if ( cpf:check-transition($cpf:document-uri,$cpf:transition)) then try { (: START your custom XQuery here :) let $doc := fn:doc($cpf:document-uri) return xdmp:eval( for $wpt in fn:doc($doc)//stwtext return xdmp:document-insert( fn:concat("/rom-data/", fn:concat($wpt/@id,".xml")), $wpt ) ) (: END your custom XQuery here :) , cpf:success( $cpf:document-uri, $cpf:transition, () ) } catch ($e) { cpf:failure( $cpf:document-uri, $cpf:transition, $e, () ) } else () 

by running of snippet, i take the error:

Invalid URI format

and long description of it:

  XDMP-URI: (err:FODC0005) fn:doc(fn:doc("/8122584828241226495/12835482492021535301/URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml")) -- Invalid URI format: "&#10;&#9;&#10;&#9; &#10;&#9;&#9;&#10;&#9;&#9;&#9;&#10;&#9;&#9;&#9;&#9;j&#10;&#9;&#9;&#9;&#10;&#9;&#9;&#9;&#10;&#9;&#9;&#9;&#9;&#10;&#9;&#9;&#9;&#10;&#9;&#9;&#10;&#9;&#9;&#10;&#9;&#9;&#9;Symbol f&#252;r die&#10;&#9;&#9;&#9;Stromdichte&#9;&#9;&#9;&#10;&#9;&#9;&#10;&#9;&#10;&#10;&#10;&#10;" In /18200382103958065126.xqy on line 37 In xdmp:invoke("/18200382103958065126.xqy", (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...), <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options>) $doc = fn:doc("/8122584828241226495/12835482492021535301/URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml") In /MarkLogic/cpf/triggers/internal-cpf.xqy on line 179 In execute-action("on-state-enter", "http://marklogic.com/states/initial", "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...), <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options>, (fn:doc("http://marklogic.com/cpf/pipelines/14379829270688061297.xml")/p:pipeline, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline), fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]/p:default-action, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]) $caller = "on-state-enter" $state-or-status = "http://marklogic.com/states/initial" $uri = "/8122584828241226495/12835482492021535301/URI=/content/home/admi..." $vars = (xs:QName("trgr:uri"), "/8122584828241226495/12835482492021535301/URI=/content/home/admi...", xs:QName("trgr:trigger"), ...) $invoke-options = <options xmlns="xdmp:eval"><isolation>different-transaction</isolation><prevent-deadlocks>t...</options> $pipelines = (fn:doc("http://marklogic.com/cpf/pipelines/14379829270688061297.xml")/p:pipeline, fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline) $action-to-execute = fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1]/p:default-action $chosen-transition = fn:doc("http://marklogic.com/cpf/pipelines/15861601524191348323.xml")/p:pipeline/p:state-transition[1] $raw-module-name = "/18200382103958065126.xqy" $module-kind = "xquery" $module-name = "/18200382103958065126.xqy" In /MarkLogic/cpf/triggers/internal-cpf.xqy on line 320 

i thought, it was a problem with "Document setting" in "load" section of "Flow editor"

URI=/content{$path}/{$filename}{$dot-ext}

but if i remove it, i recive the same error.

i have no idea what to do. i am really new. please help

First of all, Information Studio has been deprecated in MarkLogic 8. I would also recommend very much looking in to the aggregate_record feature of MarkLogic Content Pump:

http://docs.marklogic.com/guide/ingestion/content-pump#id_65814

Apart from that, there are several issues with your code. You are calling fn:doc twice, effectively trying to interpret the doc contents as a uri. There is an unnecessary xdmp:eval wrapping the FLWOR statement, which expects a string as first param. I think you can shorten it to (showing inner part of the action only):

   (: START your custom XQuery here :)

   let $doc := fn:doc($cpf:document-uri)
   for $wpt in $doc//stwtext
   return
     xdmp:document-insert(
       fn:concat("/roempp-data/", fn:concat($wpt/@id,".xml")),
       $wpt
     )

   (: END your custom XQuery here :)

HTH!

very many thanks @grtjn and this is my approach. Practically it is the same solution

  (: START your custom XQuery here :) xdmp:log(fn:doc($cpf:document-uri), "debug"), let $doc := fn:doc($cpf:document-uri) return xdmp:eval(' declare variable $doc external; for $wpt in $doc//stwtext return ( xdmp:document-insert( fn:concat("/roempp-data/", fn:concat($wpt/@id,".xml")), $wpt, xdmp:default-permissions(), "roempp-data" ) )' , (xs:QName("doc"), $doc), <options xmlns="xdmp:eval"> <database>{xdmp:database("roempp-tutorial")}</database> </options> ) (: END your custom XQuery here :) 

Ok, now it works. It is fine, but i found, that after the loading is over, i see in MarkLogic two documents:

  1. my splited document "/rom-data/RD-10-00258.xml" with one root element "stwtext" (as desired)
  2. origin document "URI=/content/home/admin/Vorlagen/testing/v10.new-ML.xml" with root element "docs"

is it possible to prohibit insert of origin document ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM