简体   繁体   中英

Count number of word occurrences working slow BaseX xquery

I want to count occurrences of the words in the XML document, query giving the actual count but it is working slow.

There are only two xml files size (236 KB, 155 KB) and it is taking 17 sec to produce result.

Below is the query:

let $doc := db:open('test','/ieee/test.xml')

let $tokens := $doc//text()/fn:tokenize(fn:normalize-space(.),'\s')
let $stringtoken := for $x at $pos in $tokens[position() = 1 to fn:last()-1]
                    let $y := string-join($tokens[position() = $pos to $pos + 1],' ')
                    return $y
return                  
<results>
        {
          for $result in distinct-values($stringtoken)
          let $count := count($stringtoken[. = $result])
          return
         <term word="{$result}" count="{$count}"></term>
        }
</results>

In the above query let $count:= count($stringtoken[. = $result]) is taking too much time.

Any suggestion to improve the performance of the code much appreciated.

The group by statement will speed up your query a lot:

return <results>{
  for $grouped-token in $stringtoken
  group by $token := $grouped-token
  let $count := count($grouped-token)
  return <term word="{ $token }" count="{ $count }"/>
}</results>

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM