简体   繁体   中英

Marklogic TDE views and join

How do Marklogic TDE view perform joins between two views

I have created two simple TDE Templates with one join column. I can perform a select query with join and it works fine. My question is what kind of join is actually being done in the MarkLogic database? Is it doing full document scans that will become bottleneck once volume of data grows?

xquery version "1.0-ml";
import module namespace tde = "http://marklogic.com/xdmp/tde"  at "/MarkLogic/tde.xqy";

let $emp1 := <employee><id>100</id><name>john</name><dept>10</dept></employee>
let $emp2 := <employee><id>200</id><name>mary</name><dept>10</dept></employee>
let $dept1 := <dept><id>10</id><name>accounting</name></dept>
let $dept2 := <dept><id>20</id><name>hr</name></dept>
let $emp-table := <template xmlns="http://marklogic.com/xdmp/tde">
  <context>/employee</context>
  <rows>
    <row>
      <schema-name>models</schema-name>
      <view-name>employees</view-name>
      <columns>
        <column>
          <name>id</name>
          <scalar-type>string</scalar-type>
          <val>id</val>
        </column>
        <column>
          <name>name</name>
          <scalar-type>string</scalar-type>
          <val>name</val>
        </column>
        <column>
          <name>dept</name>
          <scalar-type>string</scalar-type>
          <val>dept</val>
        </column>
      </columns>
    </row>
  </rows>
</template>      
let $dept-table := <template xmlns="http://marklogic.com/xdmp/tde">
  <context>/dept</context>
  <rows>
    <row>
      <schema-name>models</schema-name>
      <view-name>depts</view-name>
      <columns>
        <column>
          <name>id</name>
          <scalar-type>string</scalar-type>
          <val>id</val>
        </column>
        <column>
          <name>name</name>
          <scalar-type>string</scalar-type>
          <val>name</val>
        </column>
      </columns>
    </row>
  </rows>
</template>              
return (
  xdmp:document-insert('/employees/100.xml', $emp1),
  xdmp:document-insert('/employees/200.xml', $emp2),
  xdmp:document-insert('/depts/10.xml', $dept1),
  xdmp:document-insert('/depts/20.xml', $dept2),
  tde:template-insert('/templates/emp.xml', $emp-table),
  tde:template-insert('/templates/dept.xml', $dept-table)
)  

Then

select employees.name, depts.name from employees, depts where employees.dept = depts.id

The select works great.

My question is what's happening under the hoods. Is it doing equivalent of a HashJoin or full table scan? What are implications if number of documents go up to millions and billions?

You can use xdmp:sql-plan to better understand how your query gets executed.

A similar query to yours on my machine reveals that a bloom-join is used. This might differ on your machine depending on your query. But you should be able to find out what is going on having the sql plan.

<plan:plan xmlns:plan="http://marklogic.com/plan">
  <plan:select>
    <plan:project order="">
      <plan:vars>...</plan:vars>
      <plan:expr>
        <plan:join join-type="bloom-join" order="40[NULLS_IRRELEVANT]">
          <plan:join-info>
            <plan:hash left="4" right="1" operator="="></plan:hash>
            <plan:filters>...</plan:filters>
          </plan:join-info>
          <plan:elems>...</plan:elems>
          <plan:filters>..</plan:filters>
        </plan:join>
      </plan:expr>
    </plan:project>
  </plan:select>
</plan:plan>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM